salute-developers / golos

116 stars 12 forks source link

Speaker text for Dusha #3

Open TPODAvia opened 1 year ago

TPODAvia commented 1 year ago

Hello. Where can I get text data in the Dusha dataset?

artsokol commented 1 year ago

Hi! We have transcriptions for the crowd part only for the time being...

kondrat1997 commented 1 year ago

Hi! As Artem correctly stated, we have uploaded the text only for crowd part of the dataset, which can be found in the crowd.zip. During the preparation of the dataset, we asked the first group of people to pronounce these texts, however, the second group of annotators only assessed the emotion of the utterance, without taking into account the correspondence between the spoken text and what was supposed to be pronounced.

We did not transcribe the texts from the podcasts part of the dataset, so we did not upload any texts for it. However, we plan to recognize all utterances through our ASR and share the synthetic annotations