segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
715 stars 41 forks source link

Publish 0.3.x python wheels for Linux/non-macOS platforms #14

Closed hobofan closed 4 years ago

hobofan commented 4 years ago

After running into #13, I tried to use the Python bindings instead. It worked, but I noticed that installed version 0.2.2 (saw it didn't match up with the documentation in the README).

After digging into it a little bit, I saw that 0.2.2 was the last release with a platform-agnostic wheel available. All 0.3.x wheels seem to be built specifically for macOS, and are not installable on my Linux/Ubuntu machine.

I'm wondering if there are some easy adjustments that could be made to make publishing wheels for all platforms again possible (or at least Linux/Ubuntu :innocent: )?

bminixhofer commented 4 years ago

Great catch! Was an oversight on my part, I'll see if I can do a quick release to fix it. In the meantime, you can use the latest wheels built from master (they are built for manylinux1).

hobofan commented 4 years ago

Ah, great, didn't see those :)

I think maturin is packing them under a different package name than you are intending to, though? After installing the wheel, no module nnsplit is available, and it looks like it installs them as nnsplit-python.

bminixhofer commented 4 years ago

It's complicated. You can't easily change the wheel name with maturin (see https://github.com/PyO3/maturin/issues/313), and I can't use "nnsplit" because that would clash with the nnsplit core Rust module.

In the release script I use a hacky workaround: https://github.com/bminixhofer/nnsplit/blob/3baaa15be23f68fd83237eefdc910b1a0b4de6c2/release.sh#L25-L31

but I didn't yet bother for the nightly wheels. import nnsplit should work though, only the name is wrong.

hobofan commented 4 years ago

import nnsplit should work though, only the name is wrong.

Just to clarify: Are you saying it should work with the workaround that's not yet in the nightly wheels, or should it already be working with the current nightly wheels? Because right now I get (after a successful install of the wheel):

Traceback (most recent call last):
  File "main.py", line 2, in <module>
    from nnsplit import NNSplit
ModuleNotFoundError: No module named 'nnsplit'
bminixhofer commented 4 years ago

It should already work. Are you sure you're using the same Python interpreter? I just tried this on my machine with a new venv:

bminixhofer@pop-os:~/Documents$ python -m venv nnsplit_venv
bminixhofer@pop-os:~/Documents$ . nnsplit_venv/bin/activate
(nnsplit_venv) bminixhofer@pop-os:~/Documents$ pip install https://github.com/bminixhofer/nnsplit/releases/download/latest/nnsplit_python-0.3.3_post-cp36-cp36m-manylinux1_x86_64.whl
Collecting nnsplit-python==0.3.3-post from https://github.com/bminixhofer/nnsplit/releases/download/latest/nnsplit_python-0.3.3_post-cp36-cp36m-manylinux1_x86_64.whl
  Downloading https://github.com/bminixhofer/nnsplit/releases/download/latest/nnsplit_python-0.3.3_post-cp36-cp36m-manylinux1_x86_64.whl (1.9MB)
    100% |████████████████████████████████| 1.9MB 621kB/s 
Collecting torch<2,>=1.4 (from nnsplit-python==0.3.3-post)
  Cache entry deserialization failed, entry ignored
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/38/53/914885a93a44b96c0dd1c36f36ff10afe341f091230aad68f7228d61db1e/torch-1.6.0-cp36-cp36m-manylinux1_x86_64.whl (748.8MB)
    100% |████████████████████████████████| 748.8MB 2.6kB/s 
Collecting numpy (from torch<2,>=1.4->nnsplit-python==0.3.3-post)
  Downloading https://files.pythonhosted.org/packages/b8/e5/a64ef44a85397ba3c377f6be9c02f3cb3e18023f8c89850dd319e7945521/numpy-1.19.2-cp36-cp36m-manylinux1_x86_64.whl (13.4MB)
    100% |████████████████████████████████| 13.4MB 145kB/s 
Collecting future (from torch<2,>=1.4->nnsplit-python==0.3.3-post)
Installing collected packages: numpy, future, torch, nnsplit-python
Successfully installed future-0.18.2 nnsplit-python-0.3.3.post0 numpy-1.19.2 torch-1.6.0
(nnsplit_venv) bminixhofer@pop-os:~/Documents$ python
Python 3.6.9 (default, Apr 18 2020, 01:56:04) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nnsplit
>>> nnsplit
<module 'nnsplit' from '/home/bminixhofer/Documents/nnsplit_venv/lib/python3.6/site-packages/nnsplit.cpython-36m-x86_64-linux-gnu.so'>
>>> exit()
(nnsplit_venv) bminixhofer@pop-os:~/Documents$ pip freeze | grep nnsplit
nnsplit-python==0.3.3.post0
(nnsplit_venv) bminixhofer@pop-os:~/Documents$ pip freeze
future==0.18.2
nnsplit-python==0.3.3.post0
numpy==1.19.2
pkg-resources==0.0.0
torch==1.6.0
(nnsplit_venv) bminixhofer@pop-os:~/Documents$ 
hobofan commented 4 years ago

You are right. Looks like I wasn't using the my poetry venv :roll_eyes: Works well now! :)

bminixhofer commented 4 years ago

Great! Let me know if you have any more issues.

bminixhofer commented 4 years ago

This is a bit more complicated than I thought, I'll have to move the release script to CI. But will be done soon.

bminixhofer commented 4 years ago

After lots of blood, sweat and tears this is now working :wink:. Windows, Linux and MacOS wheels are built, tested and released in CI as of v0.4.9. Only caveat is that the Python version is x-post0 for any version x. That was the only way to avoid version clashes while building the Python bindings if they're called nnsplit instead of nnsplit-python. Would be solved by being able to change the wheel name from maturin, tracked by https://github.com/PyO3/maturin/issues/313.

FYI, I'll remove the nightly build now, it's mostly obsolete now and just spams notifications.