Annotation format helper functions

mir-dataset-loaders / mirdata

Python library for working with Music Information Retrieval datasets

https://mirdata.readthedocs.io/en/stable/

BSD 3-Clause "New" or "Revised" License

360 stars 59 forks source link

Annotation format helper functions #502

Open rabitt opened 3 years ago

rabitt commented 3 years ago

For certain annotation types, I find myself writing a lot of repeated code to convert between formats. For example:

evaluation format (e.g. mir_eval)
"analysis" format (e.g. for doing some simple statistics)
matrix format (e.g. converting to a fixed time or frequency grid to use during modeling)

We can support this with helper functions in the annotation classes, like Annotation.to_vector(hop_size), Annotation.to_mir_eval(), etc.

I'm working on this alongside #501 . Let me know if you have any ideas or concerns.

Edit: This is in progress! Still remaining:

resample:

[x] F0Data #506
[x] MultiF0Data #506

to_mir_eval:

[x] NoteData #510
[x] MultiF0Data #510
[x] F0Data #510
[ ] BeatData
[ ] ChordData
[ ] SectionData
[ ] KeyData
[ ] TempoData

to_matrix / to_sparse_index:

[x] NoteData #506
[x] MultiF0Data #506
[x] F0Data #506
[ ] ChordData
[ ] BeatData
[ ] SectionData
[ ] KeyData
[ ] TempoData

magdalenafuentes commented 3 years ago

This is great! About the analysis format, what do you have in mind? I usually use pandas for analysis at the level of the dataset, but I guess that can be done e.g. by giving the path to the metadata file(s) and loading them outside of mirdata. I'm curious about what use case are you thinking of

rabitt commented 3 years ago

About the analysis format, what do you have in mind? I usually use pandas for analysis at the level of the dataset, but I guess that can be done e.g. by giving the path to the metadata file(s) and loading them outside of mirdata. I'm curious about what use case are you thinking of

Good point... I guess it depends on the annotation type, but I was thinking our "default" format (Annotation.times, Annotation.labels) is the simplest for doing analysis. thinking about questions like "What's the average deviation of a track's beat label", "What's the distribution of labeled pitches", etc. So practically, maybe no helper is needed for this?

magdalenafuentes commented 3 years ago

I guess so... What would be the difference between mir_eval type and the default type? Annotations should be mir_eval compatible already no?

rabitt commented 3 years ago

Oh man you're right 🙈 . Ok, so this might be just ".to_matrix / .to_vector" ?

magdalenafuentes commented 3 years ago

Yeah that makes sense to me. I guess that this will make sense to some annotations but not all of them as well

rabitt commented 3 years ago

@magdalenafuentes Quick FYI that in the end, I did find a need for .to_mir_eval, because we now explicitly support different units. mir_eval expects specific units, so the helpers are a lightweight wrapper which make sure the units are matching what mir_eval expects.

magdalenafuentes commented 3 years ago

Makes sense!