Background Rezonator users can benefit from having a wide variety of samples of corpus data, for at least 2 reasons:

to illustrate what kind of data Rezonator is capable of working with
to give users data that they can immediately use for their own research needs, or simple curiosity
to support the community by giving them a place to host the .rez data they create

What to do Create web pages on Rezonator.com for corpus data from various languages, providing data suitable for use with Rezonator.

On the main Rezonator.com pages, create top-level page called "Corpus".
Use rezonator.com/corpus to host a separate page for each corpus.
Organize the corpus pages according to the following hierarchy of categories, most inclusive first:
- language (use standard ISO codes, same as for localization)
- corpus name
- data type
For example, the corpus pages would include:
- rezonator.com/corpus/en/santabarbaracorpus/transcript [original csv files]
- rezonator.com/corpus/en/santabarbaracorpus/rez
- rezonator.com/corpus/en/santabarbaracorpus/audio/wav
- rezonator.com/corpus/en/santabarbaracorpus/audio/ogg
- rezonator.com/corpus/en/santabarbaracorpus/metadata
- rezonator.com/corpus/zh/spokentaiwanmandarin/transcript
- rezonator.com/corpus/zh/spokentaiwanmandarin/rez
- rezonator.com/corpus/zh/spokentaiwanmandarin/audio
- rezonator.com/corpus/zh/pearfilm/transcript
- rezonator.com/corpus/zh/pearfilm/rez
- rezonator.com/corpus/zh/pearfilm/audio
- rezonator.com/corpus/en/gum
- rezonator.com/corpus/it/kiparla
- rezonator.com/corpus/he/
- rezonator.com/corpus/ru/
- rezonator.com/corpus/kk/
- rezonator.com/corpus/es/
- etc.
Use ISO 639-3 language codes when possible (see #639 ):
The rezonator.com/corpus page will essentially be a table of contents, with links that take the user to a separate page for each specific language.
Make sure licensing rights are handled accurately, legally, and ethically.

Future development

Include both the original data file (to validate, and practice with, the import process), and the .rez file that results
Include media (audio) as well as text files.
For each language, try to include a wide variety of data types: songs, verse, interlinear glossed text, one word per line, CoNLL-U, etc.
Include a link in the Rezonator software that takes the user to the main landing page (rezonator.com/corpus).
Use analytics to keep track of how many people are accessing these pages and downloading corpus data from them.
It is important to remove corpus data from both the Rezonator tool itself and from the Rezonator GitHub site, in order to:
- slim down Rezonator
- make sure that licensing issues for each corpus are addressed (see above)

Alternatives you have considered Perhaps use corpus.rezonator.com instead of rezonator.com/corpus? Probably not.

See also

johnwdubois / rezonator

979