Closed drscotthawley closed 2 years ago
To have some context, please see:
Please, also note that we have recently made some fixes to the script, which are in our GitHub master branch but not yet released:
Thanks Albert! I'll try pulling datasets
from the git repo instead of PyPI, and/or just wait for the next release.
I'm closing this issue then. Please, feel free to reopen it again if the problem persists.
Describe the bug
I get the message from HuggingFace that it must be downloaded manually. From the URL provided in the message, I got to UPenn page for manual download. (UPenn apparently want $250? for the dataset??) ...So, ok, I obtained a copy from a friend and also a smaller version from Kaggle. But in both cases the HF dataloader fails; it is looking for files that don't exist anywhere in the dataset: it is looking for files with lower-case letters like "*test" (all the filenames in both my copies are uppercase) and certain file extensions that exclude the .DOC which is provided in TIMIT:
Steps to reproduce the bug
Expected results
The dataset should load with no errors.
Actual results
This error message:
But this is a strange sort of error: why is it looking for lower-case file names when all the TIMIT dataset filenames are uppercase? Why does it exclude .DOC files when the only parts of the TIMIT data set with "TEST" in them have ".DOC" extensions? ...I wonder, how was anyone able to get this to work in the first place?
The files in the dataset look like the following:
...so why are these being excluded by the dataset loader?
Environment info
datasets
version: 2.2.2