urinieto / msaf

Music Structure Analysis Framework
MIT License
478 stars 79 forks source link

Implementing Barwise TF matrices (Barwise-aligned features). #154

Open ax-le opened 6 months ago

ax-le commented 6 months ago

Hi! This issue aims at discussing about implementing the Barwise TF matrix in the MSAF toolbox. The Barwise TF matrix is a feature representation sampled on bars, with a fixed number of frames per bar. It was introduced in [1], is detailed in [2, Chap. 2.4.2], but, more importantly, was shown to improve segmentation results on the traditional algorithm of Foote in [3, Sec. 2.3.3]. In that regard, I believe that this representation would be a great addition for MSAF. Still, I opened this issue because implementing such a representation will not be straightforward, and would certainly require major modifications in MSAF. In particular, it should be discussed whether this representation must be computed every time (as it is the case now for beat-synced features*), or if the computation must be optional and specified by a parameter. The most relevant people for this discussion here should probably be @urinieto and @carlthome ? Have a nice day! Best, Axel.

*Edit: I may be wrong on that point, maybe I confused "default" settings with "every time"

References

[1] Marmoret, A., Cohen, J. E., & Bimbot, F. (2022, June). Barwise Compression Schemes for Audio-Based Music Structure Analysis. In Sound and Music Computing 2022. Full text: https://arxiv.org/pdf/2202.04981.pdf. [2] Marmoret, A. (2022). Unsupervised Machine Learning Paradigms for the Representation of Music Similarity and Structure (Doctoral dissertation, Université Rennes 1). Full text: https://hal.science/tel-03937846/document. [3] Marmoret, A., Cohen, J. E., & Bimbot, F. (2023). Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm. Transactions of the International Society for Music Information Retrieval (TISMIR), 6(1), 167-185. DOI: 10.5334/tismir.167. Full text: https://hal.science/hal-04323556/file/tismir-6-1-167.pdf.

urinieto commented 6 months ago

Happy to see that the bar-synced features are getting traction! Currently, the easiest way to implement this on MSAF is to add a completely new set of features here: https://pythonhosted.org/msaf/features.html#

Basically, we would need:

This way, the user could pass timestamp of the annotated downbeats (ann) and MSAF could also provide an algorithm to estimate them (est).

That being said, when I wrote this part of MSAF (around 9 years ago!), computing features was an expensive process. Nowadays, features are typically computed on the fly (since it's so cheap), which saves a lot of disk space (MSAF's feature json files can be quite big).

So... what I would suggest is to remove the temporary storage of features and just compute them on the fly (ie, get rid of those temporary JSON files). This way, we don't need to include these new types of features on the JSON files, and backwards compatibility in future MSAF releases would be much easier. And playing around with different custom features should be much easier.

This would be a major refactor of MSAF, but I think it would be totally worth it.

What do you folks think? @carlthome and/or @ax-le are you up for the challenge? :D

ax-le commented 6 months ago

Hi @urinieto and Happy New Year!

Firstly, while it would be interesting to implement barwise synchronized features, the features coined "Barwise TF matrix" mentioned in my first message necessitate a bit more work because they consist of computing a fixed number of samples (defined by a parameter) per bar. In other terms, while barwise synchronized features contain 1 sample per bar, Barwise TF matrices contain n samples per bar (typically 96). This would need additional modifications, to cope with bar discrepancies during the song (right now, this is handled via oversampling the spectrogram and selecting regularly spaced samples in each bar, it may be debated).

Secondly, I could be down for refactoring code, but not right now! ;) Maybe it can wait as this does not seem urgent, but maybe someone would be available sooner (@carlthome ?).

urinieto commented 5 months ago

Oh gotcha, I understand. Yeah, I think this begs even further for a potential refactoring of the way MSAF takes care of the features. I would actually love to do that.. but likely not gonna happen after ISMIR 2024 organization 😅 (unless @carlthome is up for the challenge!)