pemistahl / lingua-py

The most accurate natural language detection library for Python, suitable for short text and mixed-language text
Apache License 2.0
1.02k stars 43 forks source link

Add v2+ support for Alpine Linux by providing `musllinux` wheels #213

Closed murrple-1 closed 3 months ago

murrple-1 commented 6 months ago

Showing commands in a 'Docker command' context, as it makes some contextual sense.

If one runs:

> docker run --rm -it python:3.12 bash
>> pip install lingua-language-detector
>> pip freeze

One will see that lingua-language-detector==2.0.2 is installed (at time of writing).

However, if one runs:

> docker run --rm -it python:3.12-alpine ash
>> pip install lingua-language-detector
>> pip freeze

One will see that lingua-language-detector==1.3.4 is installed (at time of writing).

I assume this is because there isn't a .whl file currently generated in the v2 branch that supports Alpine.

It would be nice - both from a performance- and feature-standpoint, and from a "Alpine is pretty ubiquitous in Docker deployments"-standpoint - if there could be an Alpine-compatible version of Lingua built and deployed to PyPi from the v2 branch. This probably affects the lingua-rs project in some capacity too.

pemistahl commented 6 months ago

Hi Murray, thanks for your request.

First of all, I'm a bit confused because both of your code snippets are identical. I'm assuming a copy-paste error here. Am I right?

The wheel files of my library support certain combinations of OS and CPU architectures but there is no differentiation between specific Linux distributions, at least as far as I can tell. I'm no Linux expert. For Linux, I currently support the architectures x86, x86_64 and aarch64 which are the most common ones I believe. Would it be necessary to create wheels for armv7l or ppc64le in order to properly support Alpine Linux? Perhaps you can tell me by looking at the PyO3 Maturin action docs. I use this action to produce the wheel files.

murrple-1 commented 6 months ago

First, yes, copy-paste error, the original post has been edited.

I will take a look to see if I can discern what the problem is (using the docs you provided and beyond).

As an off-the-cuff guess at the problem (which I hope doesn't lead me nor you down a wrong path), I wonder if it's because how Alpine uses musl libc instead of glibc as its base. Maybe rust doesn't link against other libcs automatically?

murrple-1 commented 6 months ago

So, my turn to say I'm slightly out of my depth, as I don't fully grok the Github Actions used in this repo (I can't find the mention of where/how Maturin is used anyway), but yes, it looks like I might be on the money.

Per the docs you sent me, it links to this example of a project building for both manylinux and musllinux wheels. Specifically, here and here. I don't mind attempting a PR or more help beyond that, but as I say, I can't find the build/upload step in this repo, so I'm guessing you're doing it manually?

pemistahl commented 6 months ago

It seems you are right. Currently, I build only manylinux wheels but Alpine Linux obviously needs musllinux wheels. This issue confirms that. There is additional information in the Python packaging user guide about it.

For the next major release 2.1.0, I will provide additional musllinux wheels. Thank you again for making me aware of it. The Linux universe is simply too large to oversee it in its entirety.

pemistahl commented 3 months ago

Musllinux wheels are now available for download on PyPI for the existing Lingua release 2.0.2. They will also be available for new releases.