k2-fsa / libriheavy

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Apache License 2.0
172 stars 10 forks source link

Audios for the dev and test sets #4

Open akreal opened 11 months ago

akreal commented 11 months ago

Now it's needed to download all audio files for the "large" dataset in order to have audios for the dev and test sets, which is 3 TB and it's too much if I only want to experiment with the "small" or "medium" training sets. Would it be possible to upload the audios for the dev and test sets separately?

pkufool commented 11 months ago

Sure, it is a good idea.

RuABraun commented 7 months ago

Any update on this? Don't see a way to access the eval audios separately.