reazon-research / ReazonSpeech

Massive open Japanese speech corpus
https://research.reazon.jp/projects/ReazonSpeech/
Apache License 2.0
239 stars 18 forks source link

Failed to download the "all" size #29

Closed zszheng147 closed 5 months ago

zszheng147 commented 5 months ago

load_dataset from huggingface is extremely slow and I have been stuck for days (almost a week).

fujimotos commented 5 months ago

Set up a server located in Japan (prefereably somewhere near Tokyo), and try downloading the dataset from there.

ReazonSpeech is hosted on ABCI cloud. It's slow to download if you are using an overseas connection.

zszheng147 commented 5 months ago

Thank you for your quick response. The issue is not related to network problems. I have already downloaded the intermediate data, which is stored in the downloads folder within my cache folder. I assume that the Huggingface API is supposed to unpack or process these intermediate tar files, but this is where I'm encountering difficulties, and it has been quite time-consuming.