tfjigsawsep - parameter ideas for harmonic/percussive separation in music

sevagh commented 3 years ago

Hello,

I'm experimenting with the tfjigsawsep function (and demo) from LTFAT for harmonic/percussive source separation (this should be analogous to tonal/transient separation, or steady-state/transient separation, terminology wise).

There are some helpful comments on the demo file, stating:

use t2 ~= 1.05 for percussive signals
use lower p for music, higher p for speech
"v2" gives better tonal separation at a higher performance cost, regular/v1 gives good speech/percussion

Are there any more recommendations, specifically for music instrument separation? Have there been any follow-up papers on exploring different parameters of t1, t2, p specifically for music?

For example, some "basic" ideas for modifying STFT parameters to do tonal/transient separation (with tf masking) is:

High window size/high frequency resolution for good harmonic/tonal separation
Small window size/high time resolution for good percussion/transient separation

Are there such rules of thumb for tfjigsawsep?

t2 increase towards 1.05 to improve percussion separation
reduce p to improve musical instrument separation

sevagh commented 3 years ago

Also, is it possible that tfjigsawsep is non-deterministic/non-exact? E.g. you can rerun the algorithm multiple times and get back slightly different separations.

nholighaus commented 3 years ago

Hey, I cannot really help you out with your specific question, but I'll respond anyway to let you know you are not ignored.I am myself not very familiar with that part of the toolbox and have to admit that I do not know who would be your best contact from the dev team. Maybe someone else can chime in. Otherwise I can try to take a look at the code and figure out what's going on, but that will take some days at least.Am 16.02.2021 19:27 schrieb sevagh notifications@github.com: Also, is it possible that tfjigsawsep is non-deterministic/non-exact? E.g. you can rerun the algorithm multiple times and get back slightly different separations.

—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or unsubscribe.

sevagh commented 3 years ago

No worries, my questions are not urgent.

If it helps, the author of tfjigsawsep demo and source code listed in the comments is:

Daniel Haider, 2017

https://github.com/ltfat/ltfat/blob/master/ChangeLog#L12

danedane-haider commented 3 years ago

Hi sevagh, sorry for the late reply. Which kinds of signals do you work with and what do you want to achieve with the algorithm? When I wrote the code I managed to achieve reasonable extractions of the drum components for some jazz trio recordings and 'not too complex' alternative rock pieces. However, I remember that finding the settings for good separation always required a bit of parameter tweaking and there was no general setting that worked for all types of signal equally well. The default values are a good starting point. Unfortunately, there has not been follow-up work in the direction of instrument separation, but I will try to find something in my notes that could help you.

Just in case you haven't encountered the comments in the code: Good values for t1 and t2 usually lied in the range of (0.85,0.98) with t2>t1. For musical signals, t2>1 is necessary for extracting all transient components. (Note that these are the values that determine how 'likely' a the part of the signal in a supertile has been produced by white noise.) I would formulate the following rule of thumb:

if there are tonal parts in the transient layer, try to
- decrease p (or alternatively the width and height proportions of the tiles manually via the flags 'T' and 'F')
- decrease t2, increase t1
if there are transient parts in the tonal layer, try to
- increase p (or 'T' and 'F')
- decrease t1, increase t2
if much of the signal is stuck in the residual layer, try to
- increase t1, t2 and p Other than that, I'm afraid I have no cooking recipe for t1, t2 and p.

You can also try to adapt the settings for the two Gabor transforms manually by adding the flags 'wintype', 'winsize1', 'a1', 'M1' (for the tonal system) and 'winsize2', 'a2', 'M2' (for the transient system), according to your rule of thumbs for the separation task. Maybe your signals need wider or narrower window lengths or denser sampling.

To the non-deterministic/non-exact question: This is true, namely in the sense that the values t1 and t2 are significance values with respect to the entropy of the supertiles of a (random) white noise signal which is constructed freshly every run. Therefore, the results may differ slightly each time.

I hope I could help you somehow, if you have more specific question, feel free to continue asking. Now I stay attentive :)

Cheers, Daniel

sevagh commented 3 years ago

Thanks - I just noticed I can play with those window sizes by reading the tfjigsaw source code, and have started playing with it. I get the best tonal results from the default parameters, but I have yet to find a good percussion separation with the transient layer.

By "good" I define this as PEASS (perceptual evaluation for audio source separation) scores of separating drums and other instruments from a mixed song, and comparing the separation to the original stems.

allthatsounds commented 1 year ago

since everyone seems happy, I'll close this

ltfat / ltfat

tfjigsawsep - parameter ideas for harmonic/percussive separation in music #127