otsaloma / gaupol

Editor for text-based subtitle files
https://otsaloma.io/gaupol/
GNU General Public License v3.0
250 stars 35 forks source link

Make chardet optional or use charset-normalizer instead #222

Closed nijel closed 4 months ago

nijel commented 4 months ago

While memory profiling Weblate I've noticed that 2+ MB is consumed on chardet module which we directly don't depend on.

The only reverse dependency for chardet is gaupol in our case. Everybody else seems to have switched to charset-normalizer instead, which is a maintained, faster and low memory footprint alternative.

I'm willing to contribute a pull request, but first I'd need to know which direction you prefer. Two approaches I can see:

otsaloma commented 4 months ago

chardet is not required in aeidon/gaupol, it's imported only under the aeidon.encodings.detect function and guarded with if aeidon.util.chardet_available():. I think I put in the setup-aeidon.py's install_requires for convenience. Do I understand correctly that there's no opt-out type depedencies, i.e. ones that would be installed by default but that you could opt out of with some syntax?

I think I was a bit eager to make dependencies optional back when writing these. Encoding auto-detection is something that probably 95% of users want, since they're downloading random subtitle files from the internet and can't really know the encoding.

nijel commented 4 months ago

If you do pip install aeidon you end up with chardet. You can manually uninstall it, but pip will then complain about unmet dependencies.

Anyway, I've created https://github.com/otsaloma/gaupol/pull/223 to migrate to charset-normalizer, please review it if this is something you want.