Open fwcd opened 8 months ago
Did you find any references to scientific papers or even public implementations regarding this? The few times I've looked for papers, I couldn't find anything describing a metric for something a DJ would describe as an "energy-level".
fwiw from the anecdotal evidence I've picked up over the years, my impression is that energy-analysis seems to be notoriously inaccurate, even "commercial/professional" implementations.
I agree that it's hard to find scientific papers on this. There seems to be a notion of "energy" in digital signal processing, which effectively is just an integral over the squared signal: https://en.wikipedia.org/wiki/Energy_(signal_processing)
At a very basic level we could implement that (perhaps with some form of normalization applied to the signal), perhaps with some form of weighting applied w.r.t different frequencies. For example, bass could be weighted more heavily than highs (though this is just speculation on my part).
There are also a few implementations of audio features that include "energy", though it often is unclear whether this is just the basic DSP notion or some higher level feature extracted via machine learning, PCA or something else.
Spotify, for example, encodes "energy" into one of its audio features[^1], though I haven't checked how accurate those are: https://developer.spotify.com/documentation/web-api/reference/get-audio-features
I do believe it's an interesting problem nevertheless and I consider there to be potential for writing something usable, given that it's usually pretty easy (in my anecdotal experience) to "read" the energy from the waveform manually.
[^1]: ...which by the way list a few more features that could potentially be very interesting to have as analyzers, namely acousticness
, danceability
, instrumentalness
, liveness
, speechiness
and valence
(aka. mood). Of course that's out of scope here, but I could definitely see a case for being able to write fine-grained queries based on these parameters to quickly assemble tracks for a set.
What makes it a bit challenging is that musical energy is a pretty vaguely defined concept, there are a few factors playing into that, e.g. tempo, (perceived) loudness, dynamics, chord progressions, instrumentation, rhythm, etc. Most discussions of this concept (rather than the DSP metric) on the web seem to be from a songwriting context where precise quantitative measures are of course not as important.
See also:
I've found a master's thesis on this that seems to go the ML route (and bases its data on the Spotify API): https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=8971762&fileOId=8971763
Have only skimmed over it, but this might be worth reading (including some of the papers it references).
I came to a similar conclusion as well. The Low-level energy most papers are referring probably wildly differs just based on different mastering techniques so a track will probably receive a higher energy rating just because its mastered louder.
I don't know how accurate/objective the spotify metadata is. I'd assume its a pretty poor dataset to train on, but there doesn't seem to be anything better. What about the "danceability" score of the AcousticBrainz DB? Nevertheless, ML inferencing is probably quite expensive compared to traditional algorithmic approaches.
There is a good list in essentia: https://essentia.upf.edu/algorithms_reference.html
I would say, some combination of StrongPeak + Energy in Bands + BPM + Kick Detection + some other factors that gives an overall score how much energy a track has, as musical aspect
I suggest something like this:
https://aubio.org/manual/latest/cli.html#aubiomfcc
You take the changes over each band. You weigh the changes in the bands differently, lower bands are more weight. Find a nice baseline factor for this metric.
Found another interesting (but sadly paywalled) paper: Thoresen 2022, Energy in Music: An Inventory of Observations and Ideas
I have some leftover money to buy it. Shall I?
lets wait for a GSoC contributor to pick it up. Then its probably worth it...
We really need more GSoC Mentors for all our ideas.
Happy to buy this paper for y'all if it moves the needle on this issue.
Feature Description
Mixed in Key can analyze tracks for energy level, which is something that would be pretty cool to have in Mixxx too.
For reference, this has been pitched in the forum earlier: https://mixxx.discourse.group/t/music-energy/15252