I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:
When I follow the course and try to execute
# use librosa to get example's duration from the audio file
new_column = [librosa.get_duration(path=x) for x in minds["path"]]
it will fail because the path, or x in the code snippet, looks something like /storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav (which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).
Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the 'path' value in the dataset with the local path on my machine?
Alternatively, one could change the function call of librosa.get_duration(path=x) and pass the audio array and the sampling_rate instead, e.g.
new_column = [librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]]
Hey
I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:
When I follow the course and try to execute
it will fail because the path, or
x
in the code snippet, looks something like/storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav
(which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the
'path'
value in the dataset with the local path on my machine?Alternatively, one could change the function call of
librosa.get_duration(path=x)
and pass the audio array and the sampling_rate instead, e.g.