Closed mthrok closed 1 year ago
According to the discussion, the SVD approach is not stable, so we might choose option 1 as the workaround. I can work on it as it looks straightforward. Do we need to add test for inference mode?
I found another solution in librosa. in librosa.utils.nnls
it uses np.linalg.lstsq which can be replaced by torch.lstsq
. So that we don't need to re-implement the SGD manually.
Hi, @nateanl did you port nnls method?
Hi @spyroot , not yet. I'll work on it after the 0.12 release. Stay tuned!
Ratio of relative diff smaller than 1.000000e-01 is 7.614522473886609e-05 Ratio of relative diff smaller than 1.000000e-03 is 0.0 Ratio of relative diff smaller than 1.000000e-05 is 0.0 Ratio of relative diff smaller than 1.000000e-10 is 0.0 Ratio of relative diff padded than 1.000000e-01 is 7.145936251617968e-05 Ratio of relative diff padded than 1.000000e-03 is 0.0 Ratio of relative diff padded than 1.000000e-05 is 0.0 Ratio of relative diff padded than 1.000000e-10 is 0.0
Ratio of relative diff librosa than 1.000000e-01 is 2.3429299744748278e-06 <--- was my target Ratio of relative diff librosa than 1.000000e-03 is 0.0 Ratio of relative diff librosa than 1.000000e-05 is 0.0 Ratio of relative diff librosa than 1.000000e-10 is 0.0
Ratio of relative diff my impl 1.000000e-01 is 0.0 < --- :) solved Ratio of relative diff my impl 1.000000e-03 is 0.0 Ratio of relative diff my impl 1.000000e-05 is 0.0 Ratio of relative diff my impl 1.000000e-10 is 0.0
Hi @spyroot, do you mean you already implemented the nnls method? Would you like to open a pull request for it? We can help review after the PR is created. Thanks!
Yes, I used LBFGS and tested on GPU crazy fast and superb abs error with the original source. I'll do a pull request next week. But I had two fixes for two bugs in LBFGS, which I need to commit together.
I have one question about the current implementation. Torch does backward pass in InverseMelScale. So if you do that in the training loop, it is a second backward pass.
So how is it intended to be used if you want to compute inverse in the training loop?
So if you do that in the training loop, it is a second backward pass.
That's right. The issue is if the module is in inference mode, which means it can't use gradient at all, then the optimization inside will fail. Thus we want to find an alternative solution to make the module work in both training and inference mode.
So how is it intended to be used if you want to compute inverse in the training loop?
It could be. For example, if I have a GAN that predicts the mel-spectrogram and pass it to the InverseMelScale
and GriffinLim
to get a waveform as the final output. Then we should make sure the gradients go through all modules with no failure. Does that make sense?
@nateanl thank you very much. The reason I asked about the second case. Imagine we are working in none GAN settings. if you call a backward inside a loss function, that brake thing a bit.
imagine a training pass. (pseudocodish)
for batch in batches:
loss=compute_lose(x) -> if compute_lose does inverse and it has backward()
self.scaler.scale(loss).backward()
self.scaler.step(optimizer)
self.scaler.update()
It is problematic.
Yes, we probably can formulate it as GAN. i.e, you do backward() on the main optimizer than then compute the Inverse.
Thinking a bit deeper. If Inverse computes a solution Ax=b and it is a Solution. And we know that solution exists. ( in this case, by the way, if you find a solution, you don't need to deal with Complex)
Why is it because the solution is float, i.e, you can find a solution for complex, complex -> the output float. Then you can technically compute Inverse inside But, why. Why would you want to do that? We imagine you need that Inverse for loss computation. But if you do have a solution You don't need to minimize anything in optimization formulation because if you inverse as a term in optimization formulation. That term has a solution.
Do you see my point? For example, in my case. I have to do this. in compute_loss(). I am still trying to figure out the most efficient way to do this and avoid backward()
with torch.no_grad():
stfs = self.dts_inverse(mel)
dts_inverse is my implementation.
@spyroot I see. So in your implementation, dts_inverse
doesn't require gradient optimization. This helps solve the issue that InverseMelScale
can't work in inference mode.
Regarding the usage in training, although we don't intend to optimize the InverseMelScale
, the differentiability of the module is important. Take speech enhancement task as an example, some methods optimize the model based on waveforms instead of spectrograms. They indeed achieve some performance gain by doing that. In that case, we don't want the module is hard-coded as with torch.no_grad()
because that breaks the chain of gradients.
I found another solution in librosa. in
librosa.utils.nnls
it uses np.linalg.lstsq which can be replaced bytorch.lstsq
. So that we don't need to re-implement the SGD manually.
Just wanted to know whether this is implemented by any chance.
Can this issue be solved by temporarily setting torch.enable_grad
at the call site or inside the function?
Can this issue be solved by temporarily setting torch.enable_grad at the call site or inside the function?
torch.enable_grad
works with torch.no_grad
, but torch.inference_mode
is more strict which doesn't record computation in the backward graph. Therefore the optimization inside InverseMelScale
can't be run.
Addresed via #3280
Hi, I recently encountered the same issue. Has this problem been resolved? I am currently using torchaudio==2.0.2.
InverseMelScale
uses SGD inside so it does not work when the global context isno_grad
orinference_mode
. Or even whenrequires_grad=False
would make it fail. This gives bad UX for inference.There are couple of possible workarounds
requires_grad=True
inside ofInverseMelScale