xiph / rnnoise

Recurrent neural network for audio noise reduction
BSD 3-Clause "New" or "Revised" License
4.02k stars 888 forks source link

Releasing RNNoise version 0.2 #221

Open jmvalin opened 5 months ago

jmvalin commented 5 months ago

This new release brings many improvements, including improved training, SSE4.1 and AVX2 optimizations, and run-time CPU detection (RTCD).

The distributed models are now trained using only publicly available datasets.

petterreinholdtsen commented 5 months ago

[Jean-Marc Valin]

This new release brings many improvements, including improved training, SSE4.1 and AVX2 optimizations, and run-time CPU detection (RTCD).

The distributed models are now trained using only publicly available datasets.

This sound very good. How can I learn how to do the training myself? That knowledge would be a great argument for getting this into the Debian main distribution.

<URL: https://bugs.debian.org/980839 >

-- Happy hacking Petter Reinholdtsen

DarthPJB commented 5 months ago

This release expects the build process to execute a arbitrary-shellscript to download an arbitrary file as part of the build process; outside of cmake, in a potentially privileged environment.

If I can find the time, I will PR fix for your Cmake or alike; I'm sorry I'm not doing it now - because it's very exciting to me this project is moving again.

But as it stands, this is a pretty big hole for practical secure CI-pipelines and reliability in isolated or secure environments. In short - do not do this.

THAT SAID:

I'm honestly glad this project is getting some much needed attention; but please remember that it's been relied upon for various uses, on various platforms for a very long time now.

If your not considering the need to build this as a standalone Risc-V library for a non-linux operating-system; You should be. It may need to be built without internet access. And on windows XP; to support cable TV stations using it with Gstreamer.

And many, many more that I'm aware of.

Keep up the good work, do good things, and have pride - just don't forget the use-cases beyond your own. ;)

petterreinholdtsen commented 5 months ago

[John Bargman]

This release expects the build process to execute a arbitrary-shellscript to download an arbitrary file as part of the build process; outside of cmake, in a potentially privileged environment.

This sound like a prospective xz disaster, indeed. Where is the code causing this? At least the Debian build servers do not all have Internet access during build, so it would fail at least some architectures.

-- Happy hacking Petter Reinholdtsen

jmvalin commented 5 months ago

So there's two things here: one is the download script and the other is the actual content being fetched. In terms of the script itself, it's fairly simple to audit and see that it's not doing anything bad. It's also in git and signed, so I don't really see a trust issue compared to trusting the rest of the source code.

When it comes the the actual model data being fetched, I can see how it's not as safe as the rest of the Git content (sha1 and signature). One possible way to solve that would be the equivalent to what was just added to Opus: https://github.com/xiph/opus/commit/9faf6f071

The idea Is that Git will hold the sha256 of the downloaded content and thus it can check that the checksums actually match before using it. That way compromising the server will just result in a build failure.

DarthPJB commented 5 months ago

The idea Is that Git will hold the sha256 of the downloaded content and thus it can check that the checksums actually match before using it. That way compromising the server will just result in a build failure.

This is without question an acceptable solution - I come from a version of linux called NixOS; our package manager does this for all external downloads as part of the build step; ( which is how I came to be here: https://github.com/NixOS/nixpkgs/issues/304433 )

The shellscript itself is an acceptable compromise if an external download is absolutely required; in my personal ideal external data would be fetched by git submodule (providing a solid revision control), or the dependency treated as such.

It seems the model for RNNoise has always been a dependency;

petterreinholdtsen commented 5 months ago

[Jean-Marc Valin]

So there's two things here: one is the download script and the other is the actual content being fetched. In terms of the script itself, it's fairly simple to audit and see that it's not doing anything bad. It's also in git and signed, so I don't really see a trust issue compared to trusting the rest of the source code.

Most Linux distributions block package building from downloading stuff from the Internet, to ensure reproducibility, consistency and completeness, and I would strongly suggest to not base the RNNoise build on such practice for rnnoise.

-- Happy hacking Petter Reinholdtsen

jmvalin commented 5 months ago

@petterreinholdtsen Do you have alternatives to suggest for distributing a 20 MB binary file? In version 0.1, the model was much smaller (and it still wasn't great) and didn't include the binary model (which is the preferred form for making modifications in that case).

petterreinholdtsen commented 5 months ago

[Jean-Marc Valin]

@petterreinholdtsen Do you have alternatives to suggest for distributing a 20 MB binary file? In version 0.1, the model was much smaller (and it still wasn't great) and didn't include the binary model (which is the preferred form for making modifications in that case).

I would suggest to include it in the git repository and release tarballs.

-- Happy hacking Petter Reinholdtsen

tdaede commented 5 months ago

All package managers I know of have the ability to specify multiple source packages - I'd expect them to specify the model data as one of the sources. The nixpkgs approach seems good.

Having a hash of the data in git should be enough for revision control to ensure that it has the same version of the model as the code was designed for. Submodules remain tricky to use, especially because the model is large and most people would want a shallow clone of it.

jmvalin commented 4 months ago

OK, the download process should be safer now as the download script will verify the sha256 before using the downloaded binary file.

petterreinholdtsen commented 4 months ago

[Thomas Daede]

Having a hash of the data in git should be enough for revision control to ensure that it has the same version of the model as the code was designed for.

Note, for release tarballs, such approach will break the deserted island test of free software. Those bringing the tarball and all its build dependencies with you to a computer on a deserted island with no Internet connectivity would not be able to build it as the build require Internet connectivity and the abililty to download the model without interferrence.

I heard a variant of the deserted island a few years ago, where the people gathering together at AfricaSource, a free software development confernece in the middle of a savannah, with no Internet connectivity, used a collecting of Debian DVDs as their source of build dependencies.

-- Happy hacking Petter Reinholdtsen

DarthPJB commented 4 months ago

[Thomas Daede] [SNIP] Note, for release tarballs, such approach will break the deserted island test of free software. [SNIP] -- Happy hacking Petter Reinholdtsen

I have to agree with this statement - although one would hope that the dependency will find it's way onto a debian disk too ;)

jmvalin commented 4 months ago

Unless I missed something, the RNNoise 0.2 passes the desert island test fine. The download mechanism only applies to Git.

petterreinholdtsen commented 4 months ago

[Jean-Marc Valin]

Unless I missed something, the RNNoise 0.2 passes the desert island test fine. The download mechanism only applies to Git.

Sound good. From where can I download a tarball with the model included, to verify no download from Internet take place during build?

-- Happy hacking Petter Reinholdtsen

tdaede commented 4 months ago

On the Releases page: https://github.com/xiph/rnnoise/releases/download/v0.2/rnnoise-0.2.tar.gz