Closed vossenv closed 1 year ago
What input are you feeding to the NAM plugin? Is it coming from your audio interface?
What input are you feeding to the NAM plugin? Is it coming from your audio interface?
There is no input at all here (as you can see in the gig performer shot) - just NAM by itself
Ok, I've been able to reproduce this. Silence seems to produce a complete fixed, positive DC offset. The magnitude seems to vary from model to model - likely due to gain differences.
Yep, this makes sense.
Zero isn't special to the models. When the input to the WaveNet is constant, the output is constant, but what constant is learned (as imperfectly as the rest of the model).
Happy to see this patched with an hpf.
A hpf is certainly an easy fix for the dc offset.
I still find it a bit strange, though. I ran an identity training test (input/output wavs the same). ESR reported by training was .0000 (btw, it would be nice to have more decimal places here).
Null test of the model shows -60db RMS. After dc offset correction, -67db RMS. I would have thought that if a simple linear offset can reduce error by 7db, that would have been figured out during training. Or am I misunderstanding something?
An even simpler way to correct this is to check what value the model outputs for zero, and correct the head bias by that amount. I've verified that this works for LSTM models.
It still seems like training should have been able to do a better job of finding the correct head bias value, though, doesn't it?
I too use a plugin after NAM to remove DC offset, every capture produces a different level of DC offset depending on gain. DC offset in an audio signal is not healthy, it can create distortion in following plugins, loud pops and many more problems. Better fix this at the core.
Why this important bug doesn't have a title? "DC Offset"
I want to chime in with a quick observation on my side.
As far as I have tried, having the head bias ON most of the time led to a negative effect in the model ESR during training. For that reason I tend to turn OFF the Head bias. (maybe this is only me? has anyone seen anything similar?) I see here that it can also have a not so desired effect during inference time.
Maybe it's not so necessary to have the head bias after all?
As far as I have tried, having the head bias ON most of the time led to a negative effect in the model ESR during training.
Are you talking about LSTM or WaveNet? My quick test with no head bias in WaveNet resulted in a slightly higher ESR.
ah i'm talking about LSTM.
I wonder if maybe the training pre-emphasis filter is causing this?
If so, it might make sense to store the coefficient used so that it can also be applied at playback.
@KaisKermani
As far as I have tried, having the head bias ON most of the time led to a negative effect in the model ESR during training. For that reason I tend to turn OFF the Head bias. (maybe this is only me? has anyone seen anything similar?)
I've not actually checked. I'm not sure I can think of a good reason why it'd be better one way or the other. One of those things where you might as well find out by trying.
How much a difference, out of curiosity?
@mikeoliphant
it might make sense to store the coefficient used so that it can also be applied at playback.
The pre-emphasis filter is applied to both the predictions and targets. Think of it as up-weighting certain properties of the prediction loss and un-emphasizing others.
Concretely, since the pre-emphasis filter takes ~the difference between consecutive samples, so it's like it's telling the model to get the slope of the predictions right, not their values. So I can see how PEF would degrade the ability to predict zero with zero input.
But you wouldn't apply it while making predictions in the plugin.
I'm primarily just trying to understand the root cause of the DC offset to inform what the best way to handle it is.
If it is just a zero offset, then it seems like the best compensation is to apply the relevant offset at playback - that's cheaper and less destructive than a hpf. If there is more of an actual high pass baked into training, then it would make sense to replicate that at playback.
In either case, even though I don't think it causes any significant audible issues (particularly after a typical IR), I think it is important to address - probably at the core level (rather than in the plugin). It "smells" bad, gives the perception that playback is noisier than it actually is, and messes with any null testing that isn't explicitly taking DC offset into account.
If there is more of an actual high pass baked into training, then it would make sense to replicate that at playback.
There is a sort of HPF baked into training when the pre-emphasis filter is used, but it's applied to both signals--training is trying to enforce "HPF(NAM) = HPF(Target)".
If you "remove" both HPFs (or just don't apply it), then you get the real target (and the "real" NAM model), which is what you want.
Hopefully that makes sense?
Hopefully that makes sense?
Not quite, no. It seems to me that if you take spectrum out of both signal and response during training, you should do the same at playback (potentially both before and after prediction). Otherwise the model is producing output under conditions that are different than it was trained for.
out of both signal and response
The HPF is applied to the output of the nam model and the reamp output.
Or in pseudocode:
def pre_emphasized_mse_loss(x, y):
# x is input audio e.g. from v2_0_0.wav
# y is the corresponding reamp
f = model(x) # Input is untouched!
f_pef = pef(f) # Apply pre-emphasis filter to the output of the model
y_pef = pef(y) # And to the targets
return mse(f_pef, y_pef) # NOT the loss between f and y like usual.
Here's the code: https://github.com/sdatkinson/neural-amp-modeler/blob/3cf65f56ddd1762e026d53e0adc54753ddc4dae0/nam/models/base.py#L342 To reiterate, the input is untouched before going into the model; the predictions are already computed by the time the PEF is applied to both signals.
"Pre" in "pre-emphasis" doesn't refer to doing something to the signal before it hits the model. (I don't know why the authors called it that--perhaps because it emphasizes certain characteristics of the predictions before ("pre") the loss calculation.) If it did, then I agree with you that you'd want to do it during the plugin as well...But I'd have made it part of the model (a "layer") if that were the case π
Or in other words, this is not what happens:
def pre_emphasized_mse_loss(x, y):
x_pef = pef(x)
f = model(x_pef) # Yikes! No!
return mse(f, y) # nor mse(f, pef(y)). That wouldn't make sense either.
"Pre" in "pre-emphasis" doesn't refer to doing something to the signal before it hits the model.
Got it - that's what I originally assumed. I got temporarily confused when you said it was applied to "both" signals and thought you meant training and capture (rather than model output and capture).
The fact remains, though, that applying a hpf to model output and target before the error calculation reduces the ability to learn to match low frequencies. If it eliminates any DC offset the model produces (which I suspect is does), then there is no information there to be learned. In particular, any training of head bias weights at output will be ineffective.
Note that I'm not suggesting that doing the pre-emphasis is bad (if anything, NAM is conservative here compared to what say, ToneX seems to be doing) - just trying to fully understand the impact it is having.
applying a hpf to model output and target before the error calculation reduces the ability to learn to match low frequencies.
This is true. I did it because learning high frequencies seems to be more challenging for the LSTM when using the vanilla MSE loss.
If it eliminates any DC offset the model produces (which I suspect is does), then there is no information there to be learned.
It doesn't fully eliminate it unless the coefficient is 1 (the value I picked as default is 0.85). Also, this is only one term in the loss function. The ordinary MSE is still there. The "weight" parameter is a coefficient that multiples the PEF loss--1.0 (the default) gives equal "weight" with the vanilla MSE loss (which also has a weight of 1).
At the end of the day, the thing is that if you train only on vanilla MSE, you might get the best ESR, but it also might not be what sounds best, subjectively (due to the errors in its predictions potentially focusing on certain frequencies). I didn't do a comprehensive study to tune the parameter values; I just picked what seemed to give good results and shipped it π I do recall that I felt that it was better to include (generally, I try to remove extra features--even if they improve things--if they don't give enough improvement... And here this is! π)
If you're still curious though, I'd definitely love to hear what you find if you choose to dive deeper on this π
If you're still curious though, I'd definitely love to hear what you find if you choose to dive deeper on this π
π
How much a difference, out of curiosity?
Between Head Bias ON and Head Bias OFF in the training (LSTM) I get ESR differences between 0.001 and 0.1. It's a big range yes, however consistently: Bias OFF is always better.
EDIT: Which makes me think that adding a head bias in LSTM maybe makes the training less stable/consistent.
@sdatkinson I think what we need to move forward here is a decision on whether this should be addressed in training (ie: do something to avoid models that create a DC offset) or playback (compensate for the DC offset after the fact).
My gut feeling is that it makes sense to compensate at playback. If that is the case, then the decision is whether to handle it in the core code, or push it off on the plugin(s).
Thanks for your patience. I think I'm happy to solve this with a HPF.
the reasons why are:
So as a first pass, something like an HPF at 10 or 20 Hz I think should be perfectly acceptable given the context that NAM is most often used in (i.e. modeling guitar/bass effects).
Running a 1Hz to 24kHz flat sweep through MUtility DC Blocker, looks like a 10Hz -6dB/Oct (first order) HPF is a good point.
I am using NAM to profile compressors, eq, etc so I hope the HPF will be set lower than 20 Hz 6dB slope. My issues with DC offset have been solved using a HPF as low as 2 to 5 Hz.
So as a first pass, something like an HPF at 10 or 20 Hz I think should be perfectly acceptable given the context that NAM is most often used in (i.e. modeling guitar/bass effects)
I am hoping this plugin will grow to cover outboard gear for things like mastering, so please be careful with adding any additional HPF! In my experience it can be very destructive to sound quality as the phase distortion is audible far above the cut-off point.
Not sure how accurate this example is, but the red line shows the phase shift:
PR #349 adds a 5Hz first-order HPF as the last DSP block. I'm waiting to hear back on #344 so that I can make sure that's adequately addressed so I can get 0.7.5 out the door, then we'll merge this in next.
But NeuralAmpModelerCore already has it (just merged) so in case you're waiting on an HPF from NAM for your application, you're good to go π
@sdatkinson Please excuse my ignorance, but if we now have a HPF in NeuralAmpModelerCore why do we need it here? I don't actually know how they relate to each other, but I hope it is not just being added here for some legacy gear samples which will probably be updated anyway as the software advances.
@sdatkinson Please excuse my ignorance, but if we now have a HPF in NeuralAmpModelerCore why do we need it here? I don't actually know how they relate to each other, but I hope it is not just being added here for some legacy gear samples which will probably be updated anyway as the software advances.
The core library defines the class; this repo uses it as part of an iPlug2 plugin.
NAM Seems to add a DC offset when any capture is loaded with no IR. Using any post EQ high pass, even at the lowest value and steepest slope clearly removes the offset, as expected. I'm not an expert on this, so I'm not sure to what extent this affects subsequent plugins or possibly an interface out to power amp/cab, and it's not hard to fix after the fact, but it does seem like it shouldn't be there.
To Reproduce Steps to reproduce the behavior: 1 Open a DAW or gig performer 2 Add a NAM instance and an RTA of some sort after it 3 load a few models, and increase the out level if you don't see it - but you should see a hump in the sub range of the chart that is constant. 4 Add a high pass at the lowest possible setting or add an IR to NAM, and you should see that hump disappear completely even though your filter doesn't include it.
https://github.com/sdatkinson/NeuralAmpModelerPlugin/assets/44332958/b09e650c-ae76-4611-9ebd-169116e5860d
Desktop (please complete the following information):