reazon-research / ReazonSpeech

Massive open Japanese speech corpus
https://research.reazon.jp/projects/ReazonSpeech/
Apache License 2.0
239 stars 18 forks source link

list index out of range when trying to access "all" dataset with streaming=True #2

Closed MatchaChoco010 closed 1 year ago

MatchaChoco010 commented 1 year ago

I got the following error and Dataset iterate stopped in the middle.

File "/home/user/.cache/huggingface/modules/datasets_modules/datasets/reazon-research--reazonspeech/07c40bbec39d1bbdf18f4b24cb0699b009e43eedbe04e5474f813f1bc8b5c451/reazonspeech.py", line 109, in _generate_examples path = os.path.join(local_extracted_archive_paths[i], filename) IndexError: list index out of range


I wonder if [None] could be a mistake for None in the next part of line 82 of reazonspeech.py on huggingface?

        # Download archives
        archive_paths = dl_manager.download(url)
        local_extracted_archive_paths = dl_manager.extract(archive_paths) if not dl_manager.is_streaming else [None]
fujimotos commented 1 year ago

@MatchaChoco010 Thank you or reporting!

I got the following error and Dataset iterate stopped in the middle. ... I wonder if [None] could be a mistake for None in the next part of line 82 of reazonspeech.py on huggingface?

Your observation is correct. I just fixed this issue with reazonspeech@4d27b5b02, and streaming=True should work fine now.

Please let me know if anything does not work for you.