persephone-tools / persephone

A tool for automatic phoneme transcription
Apache License 2.0
155 stars 26 forks source link

Stable hosting (+long-term archiving) of preprocessed data sets #227

Open alexis-michaud opened 4 years ago

alexis-michaud commented 4 years ago

Having preprocessed data sets at hand matters a lot for easier experimenting. Links to online data can break. This happened for Persephone-related materials: #226. The issue was fixed quickly, but in the mid & long run the answer lies in stable hosting (+long-term archiving) of preprocessed data sets.

Some data sets preprocessed by @gw17 for experiments in 2020 are up here: https://github.com/gw17/sltu_corpora

It's fine to have those in different places, hopefully with some sort of inventory somewhere (in Wiki mode?). Or could the Persephone / Elpis team also offer hosting solutions?

oadams commented 4 years ago

I agree it'd be good to think about something long term. I'm pretty open to where we host such things. You're a beacon of light when it comes to making data is available in a stable way for sharing, so I'll defer to your best judgment on this!