reazon-research / ReazonSpeech

Massive open Japanese speech corpus
https://research.reazon.jp/projects/ReazonSpeech/
Apache License 2.0
239 stars 18 forks source link

Is the audio list of each dataset publicly available? #38

Closed toshimitsu10432 closed 1 week ago

toshimitsu10432 commented 1 week ago

I'm trying to extract the "medium" dataset from the "all" dataset. Is there a publicly available "tag list" that indicates which dataset an audio belongs to.

fujimotos commented 1 week ago

I'm trying to extract the "medium" dataset from the "all" dataset.

The easiest & most reliable way to archieve that is to download the "medium" dataset through huggingface_hub.

It won't take much time, because huggingface_hub should have all the assets in its local cache.