Open simjanos-dev opened 3 months ago
I think pinning the version when installing the packages in DockerfilePythpn
should solve this, but I won't be able to write a fix until tomorrow or the day after.
I think pinning the version when installing the packages in DockerfilePythpn should solve this, but I won't be able to write a fix until tomorrow or the day after.
I use the v13.0 latest image for personal use, it has Spacy 3.7.5. I would be surprised if we went from 3.4 to 3.7.5 just by not pinning a version number.
Pinning it to an older version would solve this, but not sure if we should use an older spacy version. Also other installable packages use 3.7.0 spacy version based on their url. I think this change could also mess up the model folder for people who already have installed models.
I think maybe we should also host these files on linguacafe github if possible.
Thank you so much for your help with it! Also please take your time, it is not urgent.
If it's something we cannot fix reasonably simply, maybe we could solve it by replace it with Stanza if Turkish is available.
Well I have tried the simple solution of just updating the link, and installing Turkish does indeed downgrade spacy
which triggers #323. I'm actually ashamed I didn't notice this before, it is quite big.
I have a "hotfix" that enables you to install Turkish at the expense of breaking every other language, but we need a better solution. For the time being this should be announced as a known issue so people know it happens, know if it already affected them, and can decide whether to use Turkish anyway or not.
Pinning it to an older version would solve this, but not sure if we should use an older spacy version. Also other installable packages use 3.7.0 spacy version based on their url
We could downgrade those packages in theory so that they are compatible with the older spacy
, but I don't like the idea and will 100% break every other extra package which is probably worse.
I think maybe we should also host these files on linguacafe github if possible
I thought about that at some point but was unsure given their size. at this point it is probably worth giving it a second chance.
we could solve it by replace it with Stanza if Turkish is available
Actually not a bad idea, but I need to actually go back to the open PR. Cobbling something together that fits out use case is far easier than making something fit for upstream and will also be useful for the prior point, however it will still take some time.
As a side comment lxml[html_clean]
had a breaking change which I already addressed, in case you try a dev build and it fails for you.
https://huggingface.co/turkish-nlp-suite/tr_core_news_md/tree/main
The Turkish language model was renamed. It also seems to have a Spacy version requirement of
>=3.4.2,<3.5.0
, which was present even before the name change. The tokenizer.py script dies when I try to install after changing the url, and it messes up the model directory with a spacy 3.4 version. It also stops functioning after an attempted Turkish install and restarting the script.@sergiolaverde0 I don't know yet how to fix this issue. I tagged you in case you are interested, and have an idea.