Closed rabitt closed 3 years ago
Regarding the ones assigned to me: cante100
and irmas
are passing the full test.
However, beatport_key
is failing. The issue is with the special characters. Here's an example:
I am running these tests on my MacBook Pro 2017 using Catalina 10.15.7. I am trying to run the same full_test in our ubuntu server to see if the problem is in the OS. I'll keep you up-to-date asap!
Meanwhile, I was running full tests on Saraga Carnatic and Hindustani and something broke as well. I am revising it and will create a PR with the patch.
Can you share the error you're getting in beatport_key
here?
@rabitt I don't have the beatles
audio with me here, do you? if so, can we switch one test dataset?
Can you share the error you're getting in
beatport_key
here?
Yes, sorry. I haven't share it because it's basically a FileNotFound error, because it's looking for this 1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3
"broken" filename which does not exists in the dataset folder.
I'll paste the error as soon as I finish with the Saraga testing!
Sure, no rush. rwc_classical
also breaks :)
FAILED tests/test_full_dataset.py::test_load - ValueError: Object should not be empty, use None instead
Looking into it now
That's the error for beatport_key
:
def load_audio(audio_path):
"""Load a beatport_key audio file.
Args:
audio_path (str): path to audio file
Returns:
y (np.ndarray): the mono audio signal
sr (float): The sample rate of the audio file
"""
if not os.path.exists(audio_path):
> raise IOError("audio_path {} does not exist".format(audio_path))
E OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/
1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist
mirdata/datasets/beatport_key.py:152: OSError
That's the error for
beatport_key
:def load_audio(audio_path): """Load a beatport_key audio file. Args: audio_path (str): path to audio file Returns: y (np.ndarray): the mono audio signal sr (float): The sample rate of the audio file """ if not os.path.exists(audio_path): > raise IOError("audio_path {} does not exist".format(audio_path)) E OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/ 1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist mirdata/datasets/beatport_key.py:152: OSError
Looks like it's different unicode decoding problems maybe? @genisplaja if you just use mirdata locally and try getting the audio for that trackid, does it work?
@magdalenafuentes given the number of bugs....i think maybe it's worth testing all the loaders 🙃 . What do you think?
@genisplaja I think @PRamoneda ran into a similar bug. the bug was happening when uncompressing the archive created with a particular encoding. I prepared a patch back then to fix this in a custom unzip function. happy new year, everyone!
@nkundiushuti yes, we discussed it with Pedro and we will meet afterwards to see what happens with Beatport as it looks like this error is quite similar than the one you are mentioning. In my Catalina MacBook Pro it's still breaking (I just re-tested it!). In fact, IRMAS once had an encoding/decoding issue like that as well. Thanks for the note:)
That's the error for
beatport_key
:def load_audio(audio_path): """Load a beatport_key audio file. Args: audio_path (str): path to audio file Returns: y (np.ndarray): the mono audio signal sr (float): The sample rate of the audio file """ if not os.path.exists(audio_path): > raise IOError("audio_path {} does not exist".format(audio_path)) E OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/ 1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist mirdata/datasets/beatport_key.py:152: OSError
Looks like it's different unicode decoding problems maybe? @genisplaja if you just use mirdata locally and try getting the audio for that trackid, does it work?
@rabitt I did that and the path is loaded with this wrong character again where there should be an "é". This is the audio_path extracted from the locally loaded track.
audio_path=".../audio/1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3
This is the code snippet I used:
import mirdata
beatport = mirdata.initialize('beatport_key')
data = beatport.load_tracks()
example_track = data['41'] # I searched for the id in the index
beatport.load_audio(example_track.audio_path) # Here it breaks because it cannot find the audio
The issue is with the special characters, that is not the first time it happens as @nkundiushuti mentioned earlier. Tomorrow we will take look at it together with @PRamoneda to see where the error comes from!
check the files on the hard drive to see if they were decompressed correctly. if that's ok then check the index. maybe it was created containing these weird chars.
În mie., 6 ian. 2021 la 21:19, Genís Plaja notifications@github.com a scris:
That's the error for beatport_key:
def load_audio(audio_path):
"""Load a beatport_key audio file. Args: audio_path (str): path to audio file Returns: y (np.ndarray): the mono audio signal sr (float): The sample rate of the audio file """ if not os.path.exists(audio_path):
raise IOError("audio_path {} does not exist".format(audio_path))
E OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/
1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist
mirdata/datasets/beatport_key.py:152: OSError
Looks like it's different unicode decoding problems maybe? @genisplaja https://github.com/genisplaja if you just use mirdata locally and try getting the audio for that trackid, does it work?
@rabitt https://github.com/rabitt I did that and the path is loaded with this wrong character again where there should be an "é". This is the audio_path extracted from the locally loaded track.
audio_path=".../audio/1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3
This is the code snippet I used:
import mirdata
beatport = mirdata.initialize('beatport_key')
data = beatport.load_tracks()
example_track = data['41'] # I searched for the id in the index
beatport.load_audio(example_track.audio_path) # Here it breaks because it cannot find the audio
The issue is with the special characters, that is not the first time it happens as @nkundiushuti https://github.com/nkundiushuti mentioned earlier. Tomorrow we will take look at it together with @PRamoneda https://github.com/PRamoneda to see where the error comes from!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mir-dataset-loaders/mirdata/issues/390#issuecomment-755632243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJG624NOUKI3EHPQZQXAPLSYTAWVANCNFSM4VE5UYZQ .
-- Marius Miron http://mariusmiron.com/
@nkundiushuti Done! After downloading the file with the function beatport.download()
, it is decoded correctly in my MacBook Pro. It looks like that:
1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit)
However, in the index json file it looks like that:
1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (Andr\u251c\u2310 Picar Remix Edit)
Which in my machine is translated this way:
1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit)
And that is why it is not matching with the original (and correct!) index file. Maybe the error is in the special character decoding process! We'll check it right now.
@rabitt I agree in checking all datasets. Ideally I think we should run all tests, to make sure that the automatic tests are working as we expect. Once we confirm that (i.e. that when we find errors locally those are catch by the automatic tests), then we can trust the automatic checks for next releases.
Right now I'm fixing a bug in rwc_classical
that didn't show up in the remote tests. @nkundiushuti do you have an idea why that may be? I think it might be that the remote tests are skipping 6 tests at the beginning, and one them is the one that failed?
I am not running full tests for rwc_classical at the moment. I can add it to the tests if you want. Do I need to copy the rwc files in test resources?
În joi, 7 ian. 2021 la 14:49, Magdalena Fuentes notifications@github.com a scris:
@rabitt https://github.com/rabitt I agree in checking all datasets. Ideally I think we should run all tests, to make sure that the automatic tests are working as we expect. Once we confirm that (i.e. that when we find errors locally those are catch by the automatic tests), then we can trust the automatic checks for next releases.
Right now I'm fixing a bug in rwc_classical that didn't show up in the remote tests. @nkundiushuti https://github.com/nkundiushuti do you have an idea why that may be? I think it might be that the remote tests are skipping 6 tests at the beginning, and one them is the one that failed?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mir-dataset-loaders/mirdata/issues/390#issuecomment-756127482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJG62YEV6ZCLHCC2C2PZ4TSYW3X7ANCNFSM4VE5UYZQ .
-- Marius Miron http://mariusmiron.com/
Yes! I think that would be great. Let's add the rwc popular and jazz as well, and we can run the tests after I merge #403
This is done! Good job everyone!
To be safe let's run the manual tests for a some of the datasets again since a lot has changed. What about we each run 3 tomorrow? How about:
*Fixes in new PR #392. **Fixes in new PR #403