mir-dataset-loaders / mirdata

Python library for working with Music Information Retrieval datasets
https://mirdata.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
363 stars 58 forks source link

Run test_full_dataset.py for 0.3.0 #390

Closed rabitt closed 3 years ago

rabitt commented 3 years ago

To be safe let's run the manual tests for a some of the datasets again since a lot has changed. What about we each run 3 tomorrow? How about:

*Fixes in new PR #392. **Fixes in new PR #403

genisplaja commented 3 years ago

Regarding the ones assigned to me: cante100 and irmas are passing the full test.

However, beatport_key is failing. The issue is with the special characters. Here's an example:

I am running these tests on my MacBook Pro 2017 using Catalina 10.15.7. I am trying to run the same full_test in our ubuntu server to see if the problem is in the OS. I'll keep you up-to-date asap!

genisplaja commented 3 years ago

Meanwhile, I was running full tests on Saraga Carnatic and Hindustani and something broke as well. I am revising it and will create a PR with the patch.

magdalenafuentes commented 3 years ago

Can you share the error you're getting in beatport_key here?

magdalenafuentes commented 3 years ago

@rabitt I don't have the beatles audio with me here, do you? if so, can we switch one test dataset?

genisplaja commented 3 years ago

Can you share the error you're getting in beatport_key here?

Yes, sorry. I haven't share it because it's basically a FileNotFound error, because it's looking for this 1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 "broken" filename which does not exists in the dataset folder. I'll paste the error as soon as I finish with the Saraga testing!

magdalenafuentes commented 3 years ago

Sure, no rush. rwc_classical also breaks :)

FAILED tests/test_full_dataset.py::test_load - ValueError: Object should not be empty, use None instead

Looking into it now

genisplaja commented 3 years ago

That's the error for beatport_key:

   def load_audio(audio_path):
        """Load a beatport_key audio file.

        Args:
            audio_path (str): path to audio file

        Returns:
            y (np.ndarray): the mono audio signal
            sr (float): The sample rate of the audio file

        """
        if not os.path.exists(audio_path):
>           raise IOError("audio_path {} does not exist".format(audio_path))
E           OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/
            1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist

mirdata/datasets/beatport_key.py:152: OSError
rabitt commented 3 years ago

That's the error for beatport_key:

   def load_audio(audio_path):
        """Load a beatport_key audio file.

        Args:
            audio_path (str): path to audio file

        Returns:
            y (np.ndarray): the mono audio signal
            sr (float): The sample rate of the audio file

        """
        if not os.path.exists(audio_path):
>           raise IOError("audio_path {} does not exist".format(audio_path))
E           OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/
            1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist

mirdata/datasets/beatport_key.py:152: OSError

Looks like it's different unicode decoding problems maybe? @genisplaja if you just use mirdata locally and try getting the audio for that trackid, does it work?

rabitt commented 3 years ago

@magdalenafuentes given the number of bugs....i think maybe it's worth testing all the loaders 🙃 . What do you think?

nkundiushuti commented 3 years ago

@genisplaja I think @PRamoneda ran into a similar bug. the bug was happening when uncompressing the archive created with a particular encoding. I prepared a patch back then to fix this in a custom unzip function. happy new year, everyone!

genisplaja commented 3 years ago

@nkundiushuti yes, we discussed it with Pedro and we will meet afterwards to see what happens with Beatport as it looks like this error is quite similar than the one you are mentioning. In my Catalina MacBook Pro it's still breaking (I just re-tested it!). In fact, IRMAS once had an encoding/decoding issue like that as well. Thanks for the note:)

genisplaja commented 3 years ago

That's the error for beatport_key:

   def load_audio(audio_path):
        """Load a beatport_key audio file.

        Args:
            audio_path (str): path to audio file

        Returns:
            y (np.ndarray): the mono audio signal
            sr (float): The sample rate of the audio file

        """
        if not os.path.exists(audio_path):
>           raise IOError("audio_path {} does not exist".format(audio_path))
E           OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/
            1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist

mirdata/datasets/beatport_key.py:152: OSError

Looks like it's different unicode decoding problems maybe? @genisplaja if you just use mirdata locally and try getting the audio for that trackid, does it work?

@rabitt I did that and the path is loaded with this wrong character again where there should be an "é". This is the audio_path extracted from the locally loaded track.

audio_path=".../audio/1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3

This is the code snippet I used:

import mirdata
beatport = mirdata.initialize('beatport_key')
data = beatport.load_tracks()
example_track = data['41']  # I searched for the id in the index
beatport.load_audio(example_track.audio_path)  # Here it breaks because it cannot find the audio

The issue is with the special characters, that is not the first time it happens as @nkundiushuti mentioned earlier. Tomorrow we will take look at it together with @PRamoneda to see where the error comes from!

nkundiushuti commented 3 years ago

check the files on the hard drive to see if they were decompressed correctly. if that's ok then check the index. maybe it was created containing these weird chars.

În mie., 6 ian. 2021 la 21:19, Genís Plaja notifications@github.com a scris:

That's the error for beatport_key:

def load_audio(audio_path):

    """Load a beatport_key audio file.

    Args:

        audio_path (str): path to audio file

    Returns:

        y (np.ndarray): the mono audio signal

        sr (float): The sample rate of the audio file

    """

    if not os.path.exists(audio_path):
      raise IOError("audio_path {} does not exist".format(audio_path))

E OSError: audio_path tests/resources/mir_datasets_full/beatport_key/audio/

        1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3 does not exist

mirdata/datasets/beatport_key.py:152: OSError

Looks like it's different unicode decoding problems maybe? @genisplaja https://github.com/genisplaja if you just use mirdata locally and try getting the audio for that trackid, does it work?

@rabitt https://github.com/rabitt I did that and the path is loaded with this wrong character again where there should be an "é". This is the audio_path extracted from the locally loaded track.

audio_path=".../audio/1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit).mp3

This is the code snippet I used:

import mirdata

beatport = mirdata.initialize('beatport_key')

data = beatport.load_tracks()

example_track = data['41'] # I searched for the id in the index

beatport.load_audio(example_track.audio_path) # Here it breaks because it cannot find the audio

The issue is with the special characters, that is not the first time it happens as @nkundiushuti https://github.com/nkundiushuti mentioned earlier. Tomorrow we will take look at it together with @PRamoneda https://github.com/PRamoneda to see where the error comes from!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mir-dataset-loaders/mirdata/issues/390#issuecomment-755632243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJG624NOUKI3EHPQZQXAPLSYTAWVANCNFSM4VE5UYZQ .

-- Marius Miron http://mariusmiron.com/

genisplaja commented 3 years ago

@nkundiushuti Done! After downloading the file with the function beatport.download(), it is decoded correctly in my MacBook Pro. It looks like that:

1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit)

However, in the index json file it looks like that:

1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (Andr\u251c\u2310 Picar Remix Edit)

Which in my machine is translated this way:

1190980 Andrew Spencer, The Vamprockerz - Zombie 2k10 (André Picar Remix Edit)

And that is why it is not matching with the original (and correct!) index file. Maybe the error is in the special character decoding process! We'll check it right now.

magdalenafuentes commented 3 years ago

@rabitt I agree in checking all datasets. Ideally I think we should run all tests, to make sure that the automatic tests are working as we expect. Once we confirm that (i.e. that when we find errors locally those are catch by the automatic tests), then we can trust the automatic checks for next releases.

Right now I'm fixing a bug in rwc_classical that didn't show up in the remote tests. @nkundiushuti do you have an idea why that may be? I think it might be that the remote tests are skipping 6 tests at the beginning, and one them is the one that failed?

nkundiushuti commented 3 years ago

I am not running full tests for rwc_classical at the moment. I can add it to the tests if you want. Do I need to copy the rwc files in test resources?

În joi, 7 ian. 2021 la 14:49, Magdalena Fuentes notifications@github.com a scris:

@rabitt https://github.com/rabitt I agree in checking all datasets. Ideally I think we should run all tests, to make sure that the automatic tests are working as we expect. Once we confirm that (i.e. that when we find errors locally those are catch by the automatic tests), then we can trust the automatic checks for next releases.

Right now I'm fixing a bug in rwc_classical that didn't show up in the remote tests. @nkundiushuti https://github.com/nkundiushuti do you have an idea why that may be? I think it might be that the remote tests are skipping 6 tests at the beginning, and one them is the one that failed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mir-dataset-loaders/mirdata/issues/390#issuecomment-756127482, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJG62YEV6ZCLHCC2C2PZ4TSYW3X7ANCNFSM4VE5UYZQ .

-- Marius Miron http://mariusmiron.com/

magdalenafuentes commented 3 years ago

Yes! I think that would be great. Let's add the rwc popular and jazz as well, and we can run the tests after I merge #403

magdalenafuentes commented 3 years ago

This is done! Good job everyone!