johnwdubois / rezonator

Rezonator: Dynamics of human engagement
35 stars 2 forks source link

Chinese: Spoken language corpora #743

Open johnwdubois opened 3 years ago

johnwdubois commented 3 years ago

What to do

  1. For the Chinese languages listed below, do the following :

    • Download the spoken corpus data (transcriptions and .mp3 audio)
    • Prep the data, if necessary
    • Import the .cha files into Elan
    • Export from Elan as tab-delimited text
    • Import into Rezonator, using the appropriate Import pathway
    • Test the quality of the imported data: Does it look right?
    • Test by annotating some of the data in Rezonator (for Rez chains, Track chains, Stacks, etc.)
    • Save the .rez files
    • Test the .rez files by opening them, see if everything looks like what you saved
    • Convert the audio files from .mp3 to .ogg (required by GameMaker Studio)
    • Link the .rez files to the corresponding audio, and test
    • Upload the tested .rez files to be hosted on the appropriate landing page at rezonator.com/corpus/
  2. This is the NEW version of the NCCU Corpus of Spoken Taiwan Mandarin. This is the one to work with.

  3. Here is the old version of the same corpus: NCCU Corpus of Spoken Chinese (doi:10.21415/T5DT2)

  4. Here are some other related spoken corpora.

CallHome Corpus

CallFriend Corpus

References Read the corpus documentation to understand the data:

gtroiani commented 3 years ago

@angelina-yuan