mlcommons / peoples-speech

The People’s Speech Dataset
https://mlcommons.org/en/peoples-speech/
Apache License 2.0
98 stars 12 forks source link

Galv/ia download unsupervised #64

Open galv opened 2 years ago

galv commented 2 years ago

Just do pip install internetarchive and run:

python download_items.py

You can copy download_items.py on its own without the rest of the repo. It is a standalone file.

You will need to change this string to a path to a disk with a lot of space: https://github.com/mlcommons/peoples-speech/commit/a30eef6960e6b05119c87f6f1a167666d166e454#diff-426e62c5a30597bce4f1ca7f0be01f9d65e62d5e903ead8dda830d2bb8773191R97

I have this running now for the sake of sanity and will modify it as needed if I see errors.

github-actions[bot] commented 2 years ago

MLCommons CLA bot:
Thank you for your submission, we really appreciate it. We ask that you all sign our MLCommons CLA and be a member before we can accept your contribution. If you are interested in membership, please contact membership@mlcommons.org .
1 out of 2 committers have signed the MLCommons CLA.
:white_check_mark: @galv
:x: @autoblack
autoblack seems not to be a GitHub user. You need a GitHub account after you become MLCommons member. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request