xiph / rnnoise

Recurrent neural network for audio noise reduction
BSD 3-Clause "New" or "Revised" License
3.97k stars 890 forks source link

Support for ffmpeg arnndn filter? #162

Open sarek9 opened 3 years ago

sarek9 commented 3 years ago

Thanks for a really cool filter! Got lots of potential!

I've run a few trainings on my own material and have dumped the weights using

python dump_rnn.py weights.hdf5 ..\src\rnn_data.c filter.rnnn orig

But when I try to use the models in ffmpeg as

ffmpeg -i in.wav -af "arnndn=m=filter.rnnn" out.wav

I always get

Error initializing filter 'arnndn' with args 'm=filter.rnnn'
Error reinitializing filters!
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #0:0

Did I miss something or if not, what are the requirements of getting the models to work with ffmpeg? My models look a lot like those at https://github.com/GregorR/rnnoise-models but I'm still getting this error...

richardpl commented 3 years ago

does it start with "rnnoise-nu model file version 1" ?

sarek9 commented 3 years ago

Yes, am I missing something? You can download one of the models from https://drive.google.com/file/d/1Vc2vw-TF7gCAPwdYvdyS74iwNw1VatMW/view?usp=sharing

richardpl commented 3 years ago

All models have 87526 words (as counted by vim) your file have hundreds more and ffmpeg parser does not skip excessive numbers so it errors on negative number because it check that number of entries are positive. Also file have floats values where it should not have.

sarek9 commented 3 years ago

I also noticed the floats but it didn't change the outcome if I made them to int's. I don't quite follow you on the negative numbers though as this model (by Gregor Richards) https://drive.google.com/file/d/13KNjCkm6snQmpDE-E-sCTulRGQs3zbYx/view?usp=sharing also has negative numbers but still works with ffmpeg? Does the word count matter? (I haven't checked the ffmpeg code)

Does this mean that the rnnoise dump utility ultimately doesn't create compatible model files for ffmpeg?

richardpl commented 3 years ago

Maybe utility adds random data at end of line. But first numbers says how array is big for each dimension. And linked file definitely have excessive entries. Does your model work with rnnoise code of this repo?

sarek9 commented 3 years ago

Interesting, so the numbers at the beginning of each section describes the size for each dimension. I haven't checked if it works with the rnnoise utility. I'll have to get back to you on that but since it follows the direction of 1ch RAW PCM s16le 48kHz I'd expect it to work.

sarek9 commented 3 years ago

OK so I've tested my model with the rnnoise code of this repo and it works. It produces a valid RAW PCM file with noise reduced (as much as one can expect from 5 epochs).

richardpl commented 3 years ago

So you have .c file of model? Can you share it?

sarek9 commented 3 years ago

You can find my latest .c file here.

It seems as if there's something added to the GRU-layers as per the sections below.

24 24 0
3600 should be 3528 (3*24 words added)

90 48 2
20160 should be 20016 (3*48 words added)

114 96 0
61056 should be 60768 (3*96 words added)
richardpl commented 3 years ago

I just added in arnndn filter code skip to new line. Feel free to try and report does it sounds correct with your model file(s). You might still convert doubles to ints because that is not standard format for models.

sarek9 commented 3 years ago

Thank you! I'll check the next build and write back here once I've tested it. Strange that the rnnoise utility only sets certain sections to float in the RNN file.

sarek9 commented 3 years ago

I've built ffmpeg from master (verified your commit was the last one) but it still won't accept the model file built by me using rnnoise. Just to sum things up (maybe this is a bug in rnnoise) or not.

This model file built by Gregor Richards works.

This model built by me using the rnnoise dump utility (weights to model) doesn't work (I've removed the generated floats).

This is the .c file created by the rnnoise utility (which matches the model I've built, also with the generated floats removed).

There seems to be 3 extra entries to the matrix for the GRU-layers (although it looks like bias vectors at first it doesn't seem like it is). It would be interesting to hear what @jmvalin or @GregorR thinks!

richardpl commented 3 years ago

Do not remove floats, just convert them to int. Aka remove .0 part. I will check if my changes work with dos line endings.

richardpl commented 3 years ago

It works even with dos line endings, so dunno what is now the problem...

sarek9 commented 3 years ago

Sorry, was unclear... meant I converted the floats to ints...

richardpl commented 3 years ago

Please upload that new file somewhere.

sarek9 commented 3 years ago

Which file? My last uploads are mentioned above.

richardpl commented 3 years ago

Well that file in above, i downloaded and compared with previous version and its different and also missing several lines/data.

sarek9 commented 3 years ago

Aha, well it was just a newer model from a different training but still exported using rnnoise, so it should have the same basic structure. Are you saying it doesn't?

richardpl commented 3 years ago

It is missing new lines so export to .rnnn file is buggy.

sarek9 commented 3 years ago

I will look further into this tomorrow but I'm curious to know what reference implementation was used for the arnndn filter to start with? Which were the models tested?

richardpl commented 3 years ago

All Gregor models including one from this repo works just fine.

sarek9 commented 3 years ago

OK, so I've had some deeper look into this and it turns out it's the bias vectors for the GRU-layers after all. Sometimes the bias vectors are larger than 3*neurons (in my case twice). The offending row is this one in ffmpeg libavfilter/af_arnndn.c

#define INPUT_GRU(name) do { \
    INPUT_VAL(name->nb_inputs); \
    INPUT_VAL(name->nb_neurons); \
    ret->name ## _size = name->nb_neurons; \
    INPUT_ACTIVATION(name->activation); \
    NEW_LINE(); \
    INPUT_ARRAY3(name->input_weights, name->nb_inputs, name->nb_neurons, 3); \
    NEW_LINE(); \
    INPUT_ARRAY3(name->recurrent_weights, name->nb_neurons, name->nb_neurons, 3); \
    NEW_LINE(); \
    INPUT_ARRAY(name->bias, name->nb_neurons * 3); \    /* <-- Bias vectors can be larger than 3 neurons */
    NEW_LINE(); \
    } while (0)

I guess the rnnoise-nu model file format is a bit flawed since it should include bias vector lengths but in any case, maybe it's a good idea to add some code to scan until EOL (with some limits) as a work-around for the v1 file format?

richardpl commented 3 years ago

That is what NEW_LINE does. But another file you provided is even more broken. Note that extra bias entries are not used by code or by rnnoise at all.

sarek9 commented 3 years ago

So only bias vectors up to 3*neurons are used by ffmpeg? If that's the case then I guess the dump utility in rnnoise is flawed to not limit output to this.

richardpl commented 3 years ago

Look at rnnoise code in this repo, ffmpeg arnndn filter code is derivation of it.

sarek9 commented 3 years ago

I'm certainly not an expert on Keras but why would one want to skip certain bias vectors? Wouldn't it make sense to correct as much as possible in the chain of layers? Wouldn't this cause gradient descent to require more epochs (or worse)?

Metal-HTPC commented 1 year ago

I have a hard time getting it to run in ffmpeg. It seems that the last version was updated on on Sep 2, 2018 (https://github.com/GregorR/rnnoise-models) Maybe it doesnt work with the latest ffmpeg? I am pretty new when it comes to using models. What I tried so far was

-af arnndn="E:\rnnoise-models-master\somnolent-hogwash-2018-09-01/sh.rnnn" -f s16le -ac 1 -ar 48000 out.raw -af "arnndn="E:\rnnoise-models-master\somnolent-hogwash-2018-09-01/sh.rnnn" -acodec pcm_s16le -ar 48000 -f WAV%1%.wav -af "arnndn=m='E:\rnnoise-models/somnolent-hogwash-2018-09-01/sh.rnnn'" -acodec pcm_s16le -ar 48000 -f WAV %1%.wav

also tried exporting it raw -f s16le -ac 1 -ar 48000 out.raw

but it all pretty much leads to the following error

Successfully opened the file. Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help [aost#0:0/pcm_s16le @ 0000028f074400c0] cur_dts is invalid [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream) [AVFilterGraph @ 0000028f0702e5c0] Setting 'model' to value 'E' [AVFilterGraph @ 0000028f0702e5c0] Setting 'mix' to value 'rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn' detected 12 logical cores [Parsed_arnndn_0 @ 0000028f07482e40] [Eval @ 000000e8cd5fed40] Undefined constant or missing '(' in 'rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn' [Parsed_arnndn_0 @ 0000028f07482e40] Unable to parse option value "rnnoise-models-mastersomnolent-hogwash-2018-09-01/sh.rnnn" Error applying option 'mix' to filter 'arnndn': Invalid argument Error reinitializing filters! Failed to inject frame into filter network: Invalid argument Error while processing the decoded data for stream #0:0 [AVIOContext @ 0000028f07481c00] Statistics: 0 bytes written, 0 seeks, 0 writeouts Terminating demuxer thread 0 [AVIOContext @ 0000028f07023a00] Statistics: 327714 bytes read, 3 seeks Conversion failed!

what am I doing wrong? Thanks in advance for any help

richardpl commented 1 year ago

Why you discuss ffmpeg filter in unrelated project?

Metal-HTPC commented 1 year ago

I thought that this is the related project as it was called "Support for ffmpeg arnndn filter?" What would be the right one then?

richardpl commented 1 year ago

For ffmpeg help ask on FFmpeg mailing list, reddit, stackexchange, discord etc. not spamming on unrelated github project. You need to learn how to escape ':' for your filter command to work.