Closed danielkorg closed 2 years ago
Hello,
The reconstructed audio is always in mono. FitzGerald's version of ADRess (adress_stereo) combines both left and right channels into one STFT whereas adress considers them separately, but left_recons
,right_recons
, and recons
are all mono track audio data converted from the STFTs.
Also, adress_stereo
is probably less buggy than adress
since the latter was implemented first when I was learning how ADRess works :P
OK, I can understand that, but let's imagine a scenario where you literally have stereo source material where in one you have some instruments panned mostly left, and a bit center and the other is mostly right and a bit center, so that when you listen you get the full stereo image from combining these stereo sources.
Now if I was to apply this algo, it would mean at least in some way I lost the original stereo image of the panned material, so it didn't recover the original source material as best as possible, which was in stereo, in compacted in mono, which then loses spatial information for example if you want to re-spatialize later or something.
But OK, I can understand this is maybe how Fitzgerald ADress works, even tho I would find that a bit strange considering Fitzgerald's later research papers where he seems to combine adress and other algos in a stereo way, so i don't know... and not to mention other research papers on similar topic where the output is given in stereo. But anyway, thanks for replying! :)
I see what you mean - say we have a stereo audio with 3 well separated sources, then when we extract one of them, we would expect it to be a stereo output with the same intensities in the left and right channels as the input, except the 2 other sources are removed, correct?
I think that is entirely possible with a different formulation than the one I used (perhaps one of the later ones?). For reference, my implementation is based on his paper in 2012 and 2013, without the NMF and nearest neighbour median filtering parts. The spectrograms of the extracted audios shown in both papers are in mono.
This is the schematic that I understood:
And one of the synthetic examples I did:
Btw, how do you plot the frequenzy-azimuth spectrogram and also the null magnitude estimation graph?
I think the image above was produced with this example. For the frequency-azimuth spectrogram and the null estimation graphs, have a look at lines 106-117:
for i, tau in enumerate(taus):
n, p, nargs, _, _ = adress.adress_stereo_null_peak_at_sec(tau,
t=t,
left_stft=left_stft,
right_stft=right_stft,
beta=beta)
nulls.append(n)
# The full 2d peaks freq-azi spectrogram
pmat = np.zeros(n.shape)
pmat[np.arange(nargs.shape[0]), nargs] = p
peaks.append(pmat)
nulls
contain the values for the frequency-azimuth spectrogram, and peaks
contain the values for the null magnitude estimation.
To plot them, I used matplotlib.pcolormesh
with:
In the python example, I encapsulated all the values into FigData
objects to conveniently plot them with customizations on the graphs.
Thank you, I guess I can close this issue for now.
Hi,
the algorithm itself seems to work, but the stereo (Fitzgerald) version of ADRress only outputs in mono, not in stereo. Is there a simple fix for this?
Thanks!