xiph / opus

Modern audio compression for the internet.
https://opus-codec.org/
Other
2.23k stars 597 forks source link

Please provide an option to not depend on downloading model data #337

Open bbhtt opened 4 months ago

bbhtt commented 4 months ago

We build opus from git source and we'd like to not switch to tarballs or download the model tarball. Given the recent xz situation downloading tarballs with no trace on git is complicated and seems like a security issue.

Having an option to stop depending on that seems like it'd be great.

With the current meson configuration:

-Dfloat-api=true 
-Dasm=disabled 
-Dhardening=true 
-Dcustom-modes=true 
-Denable-deep-plc=true 
-Denable-dred=true 
-Denable-osce=true 
-Ddocs=enabled 
-Dextra-programs=disabled 
-Dtests=disabled 
-Dintrinsics=enabled 
-Drtcd=enabled

meson seems to fail on meson.build:636:24: ERROR: File dnn/fargan_data.h does not exist. and is not even aware of the download model script living in autogen.

Whereas the download model script is passing an unknown commit https://github.com/xiph/opus/blob/ab4e83598e7fc8b2ce82dc633a0fc0c452b629aa/autogen.sh#L12C24-L12C31

What is the source of that commit? Can this model data be found on git somewhere? Can this be implemented as a git submodule?

Looking through the git history seems at some point there was a submodule, which was reverted in favour of downloading the model tarball.

Thanks!

bbhtt commented 4 months ago

Hello, also why does the model tarball include ~130 MB of binary model files which the opus tarballs do not?

Also some of the headers are quite larger in model tarball than in opus tarballs

Model tarball:

Screenshot from 2024-04-09 23-46-38

Opus tarball:

Screenshot from 2024-04-09 23-47-55

jmvalin commented 4 months ago

In the model tarball downloaded from git, the models include all float weights (e.g. for debugging), whereas for the Opus release tarball, the float data was removed to make the release as small as possible.

bbhtt commented 4 months ago

What do you mean by downloading from git? The script is downloading from here https://media.xiph.org/opus/models/

Also it makes really hard to know which model tarball to update to, because the expected tarball commit is in the script only. If you are building in an environment without network access you need to know which filename to download beforehand.

the models include all float weights (e.g. for debugging), whereas for the Opus release tarball, the float data was removed to make the release as small as possible.

Is there some kind of docs on how to do this? Also knowing the source of the model data would be great too. How is the model tarball produced?

Thanks

jmvalin commented 4 months ago

You can run scripts/shrink_model.sh to remove the debug float data from the extracted models. From there, you can just run "make dist" to build a tarball that has just what's needed.

heftig commented 4 months ago

I also ran into this issue. Could you add the shrunken model data to the opus git repository, please, so that they're covered by your signed tags and opus can be built without any further downloads? The model tarballs are not an option as they're entirely unverified (and also very large).

Erick555 commented 4 months ago

Maybe you can also provide an option to disable this new feature.

jmvalin commented 4 months ago

The newer models are just too big for git. That being said, the main branch has been updated to verify them so they cannot be compromised on the download server.

nanonyme commented 4 months ago

Looks like the process is now more or less documented. Would it be possible to still have this checksum a separate file so it wouldn't need to be parsed from a shell script?

jmvalin commented 4 months ago

I'm not against, but I don't have the skills to update the Windows version of the script. @mklingb any thoughts on that?

nanonyme commented 4 months ago

The batch file would probably only turn simpler by having the checksum in a separate file rather than having it parse the shell script.