Open GregorR opened 6 years ago
That is so awesome! Of course I will prefer to use yours. I guess it's time to code and get the release going. Do you think that is possible to add support for receiving floats instead of shorts? That would be cleaner too.
Since rnnoise does receive floats, I assume you mean floats of the usual range from -1 to 1 instead of floats from -32768 to 32767 :)
The models are trained on floating point values, of course, but for whatever reason the original author opted to train them on values from -32768 to 32767 (i.e., 16-bit signed integer values). It would be possible to retrain new models for standard float range, but I'd shy away from the idea of having a separate set of models for different formats, as that could create some very confusing results. Multiplying input values by 32768 and then dividing the output values again should be a very fast operation, since it's just a change to the exponent, and should be lossless on all unclipped audio data.
Ah yes, just saw https://github.com/lucianodato/speech-denoiser/blob/ab14ff48ac81f661994660383c4d22395f514b2e/src/sdenoise.c#L212 . Use 32768 (SHRT_MAX+1), not SHRT_MAX, and the operation will be much faster. It's still a bit silly, I agree, but oh well... too many bits and bobs in rnnoise outside of the model itself assume a particular range.
(Different sample rates, on the other hand, should be very doable. The FFT code is generic, and virtually everything relies only on the calculated bands or the FFT, so it's all doable. I'm yet to think of a way to do it without slowing down the code tho. Of course, some of the bands are above the Nyquist frequency of 12kHz audio, but then, if you're denoising 12kHz audio, something has gone dreadfully wrong in your life.)
Right! Training the model for different types sounds like a lot of work. And I agree on your comment about sample rate. I will get back to you with something ASAP.
It's not the amount of work that concerns me. It'd take a week or so, but oh well, c'est la vie. The problem is that because they would be different models, you'd get different results if you fed in identical data with different datatypes, which is just asking for confusion :)
Parameterizable sample rate is done. Doesn't affect the frame size due to some limitations in the FFT used. rnnoise_set_param(st, RNNOISE_PARAM_SAMPLE_RATE, 44100);
. In the end, all that needed to be done was adjusting the band bins properly, as I expected.
In doing so I also eliminated some silly global state, so it is now correct and harmless to do multi-channel (or even multi-track) audio through rnnoise-nu by simply creating a DenoiseState per channel per track. I doubt that it would be faster to make rnnoise-nu itself multi-channel, as it would just have to mix them (or probably their FFTs) to do the analysis and then apply the results independently to each track.
Well it's done. Thank you very much for your suggestion. Let me know if you find something wrong.
A couple minor suggestions:
It shouldn't be necessary to define these parameters, as they're defined by the header.
Just a clarification: Is this parameter being set every frame because it can be changed at any time?
It's not necessary to both rnnoise_create
and rnnoise_init
(and in fact leaks memory). rnnoise_init
is just so that you can allocate the RNNoise object yourself is you so choose; rnnoise_create
inits.
It would be nice to have some way of parameterizing the different models, or even loading a model from a file (if LV2 makes that possible/easy). For what it's worth, the different models are well worth using, and work great on suitable input!
First comment: Ooops! Second comment: Yes it should be capable of changing parameters for every block. Third comment: Ok, done. Fourth comment: Definitively doable. I had to check some plugin that is using files already and mimic its functionality. I can add both options but I will need to explore what you did for that.
FYI, I'm suddenly in communication with the original author of RNNoise, so expect... Idonno, something.
Nice! Go Gregor!
are there any chances of a merge? did you guys speak of that possibility?
Yes. Needs some cleanup, but something's moving upstream. What exactly is TBD :)
I'd like to help if I can :)
Hi,
I've been fussing with RNNoise and made a slightly-incompatible fork that:
(1) Supports multiple neural network models, several of which I've trained (and I'm still training more),
(2) Supports a simple ASCII file format for future models, and
(3) Parameterizes the maximum attenuation to perform.
I've been scratching my head over whether to learn a whole plugin infrastructure in order to get it working slightly more tastefully, and since you have mixing wet and dry as a todo item (with the goal presumably the same as maximum attenuation, but frankly doing it in the library is a bit cleaner), maybe we can scratch each other's itches.
My rnnoise: https://github.com/GregorR/rnnoise-nu
My rnnoise models (informational, not needed to use the library): https://github.com/GregorR/rnnoise-models
In terms of the library, the changes are small.
rnnoise_create
takes anRNNModel *
as an argument, or NULL to use the default (this was what necessitated breaking compatibility).rnnoise_models
returns a NULL-terminated list of models (meaningless string names) andrnnoise_get_model
gets a model by name. Alternatively,rnnoise_model_from_file
can load a model from a file, to be freed byrnnoise_model_free
. Finally,rnnoise_set_param
lets you set the (solitary) configurable parameter.(An aside: My knowledge of LV2 is limited to... well, nothing, but I was surprised to find while poking at your library that there doesn't seem to be any mention of number of channels. Does LV2 just send each channel through a different instance of the plugin, or is this plugin single-channel-specific?)