vincentlou-git / aumix

Audio Unmixing and Music Score Transcription. Developed as my undergraduate final year project.
https://doramemo-x.github.io/aumix
0 stars 0 forks source link

ADRess stereo not correct output #36

Closed danielkorg closed 2 years ago

danielkorg commented 2 years ago

Hi,

the algorithm itself seems to work, but the stereo (Fitzgerald) version of ADRress only outputs in mono, not in stereo. Is there a simple fix for this?

Thanks!

vincentlou-git commented 2 years ago

Hello,

The reconstructed audio is always in mono. FitzGerald's version of ADRess (adress_stereo) combines both left and right channels into one STFT whereas adress considers them separately, but left_recons,right_recons, and recons are all mono track audio data converted from the STFTs.

Also, adress_stereo is probably less buggy than adress since the latter was implemented first when I was learning how ADRess works :P

danielkorg commented 2 years ago

OK, I can understand that, but let's imagine a scenario where you literally have stereo source material where in one you have some instruments panned mostly left, and a bit center and the other is mostly right and a bit center, so that when you listen you get the full stereo image from combining these stereo sources.

Now if I was to apply this algo, it would mean at least in some way I lost the original stereo image of the panned material, so it didn't recover the original source material as best as possible, which was in stereo, in compacted in mono, which then loses spatial information for example if you want to re-spatialize later or something.

But OK, I can understand this is maybe how Fitzgerald ADress works, even tho I would find that a bit strange considering Fitzgerald's later research papers where he seems to combine adress and other algos in a stereo way, so i don't know... and not to mention other research papers on similar topic where the output is given in stereo. But anyway, thanks for replying! :)

vincentlou-git commented 2 years ago

I see what you mean - say we have a stereo audio with 3 well separated sources, then when we extract one of them, we would expect it to be a stereo output with the same intensities in the left and right channels as the input, except the 2 other sources are removed, correct?

I think that is entirely possible with a different formulation than the one I used (perhaps one of the later ones?). For reference, my implementation is based on his paper in 2012 and 2013, without the NMF and nearest neighbour median filtering parts. The spectrograms of the extracted audios shown in both papers are in mono.

This is the schematic that I understood: adress_pipeline

And one of the synthetic examples I did: 210516-104740-StereoADRess_melody-chord-cl-stereo= 0 5,,0 25,0 5,,0 75 _sr=44100_hann_4096_3072

danielkorg commented 2 years ago

Btw, how do you plot the frequenzy-azimuth spectrogram and also the null magnitude estimation graph?

vincentlou-git commented 2 years ago

I think the image above was produced with this example. For the frequency-azimuth spectrogram and the null estimation graphs, have a look at lines 106-117:

for i, tau in enumerate(taus):
    n, p, nargs, _, _ = adress.adress_stereo_null_peak_at_sec(tau,
                                                              t=t,
                                                              left_stft=left_stft,
                                                              right_stft=right_stft,
                                                              beta=beta)
    nulls.append(n)

    # The full 2d peaks freq-azi spectrogram
    pmat = np.zeros(n.shape)
    pmat[np.arange(nargs.shape[0]), nargs] = p
    peaks.append(pmat)

nulls contain the values for the frequency-azimuth spectrogram, and peaks contain the values for the null magnitude estimation.

To plot them, I used matplotlib.pcolormesh with:

In the python example, I encapsulated all the values into FigData objects to conveniently plot them with customizations on the graphs.

danielkorg commented 2 years ago

Thank you, I guess I can close this issue for now.