hplt-project / OpusCleaner

OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
https://pypi.org/project/opuscleaner/
46 stars 13 forks source link

Fails to install requirements-all.txt on Python 3.10 #145

Open bhaddow opened 9 months ago

bhaddow commented 9 months ago

Installation fails on Python 3.10 fails because the specified version of OpusFilter (2.6.0) requires fast-mosestokenizer, and there is no version for 3.10.

I do not know if a later version of OpusFilter would be compatible.

jindrahelcl commented 9 months ago

FYI I use it with Python 3.8 for now, that's one version which doesn't seem to break anything

bhaddow commented 9 months ago

This also fails for me, because it tries to install bicleaner 0.17.3, which requests a broken hunspell.

bhaddow commented 9 months ago

If I comment out laserembeddings then it succeeds, and this appears to be just for ja and zh, so it's fine for me.

jindrahelcl commented 9 months ago

oh, you're right, I remember.. I only have bicleaner-hardrules in this env.

jelmervdl commented 9 months ago

It would be so easy to add a Github action to test the installation of requirements-all.txt on a couple of different platforms and python versions. It would probably be a lot harder to make them all work 😅

OpusFilter was indeed the one that dragged a hard to compile fast-mosestokenizer into the dependency tree. They've released a version 3.0.0 now: https://pypi.org/project/opusfilter/#history

… and I hope that they now rely on their fork of fast-mosestokenizer that does have wheels for Python 3.10 and newer: https://pypi.org/project/opus-fast-mosestokenizer/0.0.8.5/#files

There are quite a few commits between 2.6.0 and 3.0.0, not sure if there are any breaking changes. But by the looks of it, it might just work?

Upgrading OpusFilter and seeing whether that fixes it would be the first thing to try.

mbanon commented 6 months ago

I'm also having this issue >_<

mbanon commented 5 months ago

I'm also having this issue >_<

Hi, I fixed the versions issues in my fork and now everything seem to be installed and all filters working. In case anyone is interested: https://github.com/hplt-project/OpusCleaner/commit/dec5f6d61d841e49b87b18aca9fc8279ecb76795 It also required a couple of minor changes.