matrix-org / mjolnir

A moderation tool for Matrix
Apache License 2.0
330 stars 56 forks source link

Consider providing a non-NSFW model variant of Mjolnir to reduce image size #527

Open Half-Shot opened 2 months ago

Half-Shot commented 2 months ago

The dependency is huge and probably if you're not using it, you're not loving pulling in a GB of model. It would be nice if there was a way to have two image variants to reduce the size.

Gnuxie commented 2 months ago

There is also the dependency on https://github.com/tensorflow/tfjs which clocks in at around ~300MB. But unsure if there's a way to manage that?

MichaelSasser commented 1 month ago

I'm not a fan of programs that download stuff, needed at runtime, at runtime. It adds some uncertainty whether the download succeeds or fails, which could be completely avoided. You also had to manage the lifetime of the model (and dependency), when users attempted to cache it between versions in a volume/mount.

I'm not familiar with Node/JS, but it seems to have a variant of feature/optional/extra dependencies I know from other languages called optional dependencies. The concept would be to make the dependency optional and handle the case where the import fails in code, so it ends up like a feature switch. Using npm install --omit=optional would not fetch the dependency and the NSFW protection would be disabled. Then you could build two images, one with and one without that feature and the model. Users could choose which one they want by pulling one or the other container image tag.

Half-Shot commented 1 month ago

Node.JS optionals aren't the right tool for this, in the first line

If a dependency can be used, but you would like npm to proceed if it cannot be found or fails to install, then you may put it in the optionalDependencies object

So all it really does is allows a build to go ahead if your network drops out or your platform isn't supported. It's usually rubbish. There is the option of using https://docs.npmjs.com/cli/v10/configuring-npm/package-json#peerdependencies though which is more the right kind of cool.

I do agree with your point, probably having a "base" docker image and one with the models would work best. I'll rename the issue.