mir-dataset-loaders / mirdata

Python library for working with Music Information Retrieval datasets
https://mirdata.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
351 stars 59 forks source link

Add SALAMI functions annotations and make included JAMS export be MSAF compatible #589

Open carlthome opened 11 months ago

carlthome commented 11 months ago
  1. Include SALAMI function annotations (e.g. musical sections).
  2. Use JAMS segment tasks for SALAMI vocabulary.
  3. Separate lower/upper/function annotations into separate segmentations (for compatibility with MSAF).

This branch works with MSAF (which expects JAMS files and calls mir_eval internally in a joblib loop):

import os

import msaf
import mirdata

# Initialize SALAMI dataset.
salami = mirdata.initialize("salami")
salami.download()
salami.validate()

# Export all tracks to JAMS format, as expected by MSAF.
os.makedirs(f"{salami.data_home}/references", exist_ok=True)
for track_id, track in salami.load_tracks().items():
    jams = track.to_jams()
    jams.save(f"{salami.data_home}/references/{track_id}.jams")

# Segment all audio files.
msaf.run.process(salami.data_home, n_jobs=16)

# Compare estimated segments with annotations.
msaf.eval.process(salami.data_home, n_jobs=16)
INFO: 1359 tracks analyzed
INFO: Results:
HitRate_3P        0.548853
HitRate_3R        0.573635
HitRate_3F        0.539032
HitRate_0.5P      0.308312
HitRate_0.5R      0.329888
HitRate_0.5F      0.305686
HitRate_t3P       0.459635
HitRate_t3R       0.483056
HitRate_t3F       0.444781
HitRate_t0.5P     0.169861
HitRate_t0.5R     0.186063
HitRate_t0.5F     0.166812
HitRate_w3F       0.538601
HitRate_w0.5F     0.303910
HitRate_wt3F      0.445825
HitRate_wt0.5F    0.165879
D                 0.539999
DevR2E            2.646493
DevE2R            3.843950
DevtR2E           4.409412
DevtE2R           6.576668
PWP               0.902571
PWR               1.000000
PWF               0.947473
So                0.000000
Su                0.725209
Sf                0.000000
dtype: float64
carlthome commented 9 months ago

Hmm, how could I get a review on this? Ping @rabitt, @magdalenafuentes do you have time to spare?

magdalenafuentes commented 8 months ago

Hey @carlthome, thanks for this PR and sorry for the slow response on our side. We've recently migrated soundata to GitHub actions and updated Python and packages version, and we're looking into doing the same in mirdata in the following couple of weeks, so we're holding a bit on the other PRs for the moment to incorporate those changes first and then have the PRs tested with the updated pipeline. We'll make sure to look at your PRs as soon as we finish that up so you'll hear from us soon. Thanks for your patience!

codecov[bot] commented 8 months ago

Codecov Report

Merging #589 (185036a) into master (459833a) will decrease coverage by 0.02%. The diff coverage is 96.15%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #589 +/- ## ========================================== - Coverage 97.09% 97.07% -0.02% ========================================== Files 62 62 Lines 7286 7309 +23 ========================================== + Hits 7074 7095 +21 - Misses 212 214 +2 ```
carlthome commented 8 months ago

Think this is ready to go now but technically this is a user breaking change so I'm a bit worried and would like some guidance on how to land this.

guillemcortes commented 7 months ago

Awesome @carlthome ! Will try to take a look at it the next week. Thanks again!

carlthome commented 7 months ago

Any updates on this?

guillemcortes commented 7 months ago

Hi @carlthome ! Sorry for the radio silence, we have this on the radar and we will have a response next week for sure. Thanks for your patience!

guillemcortes commented 7 months ago

Hi @carlthome , we've continued discussing this with the team and we think that we're ready to merge this once the suggested changes are made. Also, we believe that it could be great to manage the multiannotators similarly to how it's done in soundata. There's a PR in mirdata that's been opened for a while https://github.com/mir-dataset-loaders/mirdata/pull/515 that we plan to work on, and once that is merged we could adapt SALAMI to use this MultiAnnotator functionality. What are your thoughts on that?

guillemcortes commented 5 months ago

Hi @carlthome, I hope you are doing well. Have you had time to work on this? let us know if you need any help.