Consider rnnoise-nu - Githubissues

GregorR commented 6 years ago

Hi,

I've been fussing with RNNoise and made a slightly-incompatible fork that:

(1) Supports multiple neural network models, several of which I've trained (and I'm still training more),

(2) Supports a simple ASCII file format for future models, and

(3) Parameterizes the maximum attenuation to perform.

I've been scratching my head over whether to learn a whole plugin infrastructure in order to get it working slightly more tastefully, and since you have mixing wet and dry as a todo item (with the goal presumably the same as maximum attenuation, but frankly doing it in the library is a bit cleaner), maybe we can scratch each other's itches.

My rnnoise: https://github.com/GregorR/rnnoise-nu

My rnnoise models (informational, not needed to use the library): https://github.com/GregorR/rnnoise-models

In terms of the library, the changes are small. rnnoise_create takes an RNNModel * as an argument, or NULL to use the default (this was what necessitated breaking compatibility). rnnoise_models returns a NULL-terminated list of models (meaningless string names) and rnnoise_get_model gets a model by name. Alternatively, rnnoise_model_from_file can load a model from a file, to be freed by rnnoise_model_free. Finally, rnnoise_set_param lets you set the (solitary) configurable parameter.

(An aside: My knowledge of LV2 is limited to... well, nothing, but I was surprised to find while poking at your library that there doesn't seem to be any mention of number of channels. Does LV2 just send each channel through a different instance of the plugin, or is this plugin single-channel-specific?)

lucianodato commented 6 years ago

That is so awesome! Of course I will prefer to use yours. I guess it's time to code and get the release going. Do you think that is possible to add support for receiving floats instead of shorts? That would be cleaner too.

GregorR commented 6 years ago

Since rnnoise does receive floats, I assume you mean floats of the usual range from -1 to 1 instead of floats from -32768 to 32767 :)

The models are trained on floating point values, of course, but for whatever reason the original author opted to train them on values from -32768 to 32767 (i.e., 16-bit signed integer values). It would be possible to retrain new models for standard float range, but I'd shy away from the idea of having a separate set of models for different formats, as that could create some very confusing results. Multiplying input values by 32768 and then dividing the output values again should be a very fast operation, since it's just a change to the exponent, and should be lossless on all unclipped audio data.

GregorR commented 6 years ago

Ah yes, just saw https://github.com/lucianodato/speech-denoiser/blob/ab14ff48ac81f661994660383c4d22395f514b2e/src/sdenoise.c#L212 . Use 32768 (SHRT_MAX+1), not SHRT_MAX, and the operation will be much faster. It's still a bit silly, I agree, but oh well... too many bits and bobs in rnnoise outside of the model itself assume a particular range.

GregorR commented 6 years ago

(Different sample rates, on the other hand, should be very doable. The FFT code is generic, and virtually everything relies only on the calculated bands or the FFT, so it's all doable. I'm yet to think of a way to do it without slowing down the code tho. Of course, some of the bands are above the Nyquist frequency of 12kHz audio, but then, if you're denoising 12kHz audio, something has gone dreadfully wrong in your life.)

lucianodato commented 6 years ago

Right! Training the model for different types sounds like a lot of work. And I agree on your comment about sample rate. I will get back to you with something ASAP.

GregorR commented 6 years ago

It's not the amount of work that concerns me. It'd take a week or so, but oh well, c'est la vie. The problem is that because they would be different models, you'd get different results if you fed in identical data with different datatypes, which is just asking for confusion :)

GregorR commented 6 years ago

Parameterizable sample rate is done. Doesn't affect the frame size due to some limitations in the FFT used. rnnoise_set_param(st, RNNOISE_PARAM_SAMPLE_RATE, 44100);. In the end, all that needed to be done was adjusting the band bins properly, as I expected.

In doing so I also eliminated some silly global state, so it is now correct and harmless to do multi-channel (or even multi-track) audio through rnnoise-nu by simply creating a DenoiseState per channel per track. I doubt that it would be faster to make rnnoise-nu itself multi-channel, as it would just have to mix them (or probably their FFTs) to do the analysis and then apply the results independently to each track.

lucianodato commented 6 years ago

Well it's done. Thank you very much for your suggestion. Let me know if you find something wrong.

GregorR commented 6 years ago

A couple minor suggestions:

https://github.com/lucianodato/speech-denoiser/blob/57ed88160901d47dda0ae10dbb7fefa560f821f2/src/sdenoise.c#L41

It shouldn't be necessary to define these parameters, as they're defined by the header.

https://github.com/lucianodato/speech-denoiser/blob/57ed88160901d47dda0ae10dbb7fefa560f821f2/src/sdenoise.c#L221

Just a clarification: Is this parameter being set every frame because it can be changed at any time?

https://github.com/lucianodato/speech-denoiser/blob/57ed88160901d47dda0ae10dbb7fefa560f821f2/src/sdenoise.c#L113

It's not necessary to both rnnoise_create and rnnoise_init (and in fact leaks memory). rnnoise_init is just so that you can allocate the RNNoise object yourself is you so choose; rnnoise_create inits.

https://github.com/lucianodato/speech-denoiser/blob/57ed88160901d47dda0ae10dbb7fefa560f821f2/src/sdenoise.c#L112

It would be nice to have some way of parameterizing the different models, or even loading a model from a file (if LV2 makes that possible/easy). For what it's worth, the different models are well worth using, and work great on suitable input!

lucianodato commented 6 years ago

First comment: Ooops! Second comment: Yes it should be capable of changing parameters for every block. Third comment: Ok, done. Fourth comment: Definitively doable. I had to check some plugin that is using files already and mimic its functionality. I can add both options but I will need to explore what you did for that.

GregorR commented 6 years ago

FYI, I'm suddenly in communication with the original author of RNNoise, so expect... Idonno, something.

lucianodato commented 6 years ago

Nice! Go Gregor!

lucianodato commented 6 years ago

are there any chances of a merge? did you guys speak of that possibility?

GregorR commented 6 years ago

Yes. Needs some cleanup, but something's moving upstream. What exactly is TBD :)

StephanMeijer commented 4 years ago

I'd like to help if I can :)

lucianodato / speech-denoiser

Consider rnnoise-nu #11