tree-sitter / py-tree-sitter

Python bindings to the Tree-sitter parsing library
https://tree-sitter.github.io/py-tree-sitter/
MIT License
728 stars 90 forks source link

Fails to load language on 32 bit #242

Closed arthurzam closed 2 weeks ago

arthurzam commented 2 weeks ago

I've managed to hit it on x86, arm, and ppc (all 32 bits environments). I've caught it using python 3.11.9 and python 3.12.3.

When trying to load a language, the language pointer is too big for signed int 32 bit, resulting in failures to load the language:

>>> import tree_sitter_bash
>>> tree_sitter_bash.language()
4143476320
>>> tree_sitter_bash.language()
4143476320
>>> from tree_sitter import Language
>>> Language(tree_sitter_bash.language())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t
>>> hex(4143476320)
'0xf6f86e60'

I suspect we want to use here unsigned int, or maybe limit it in another way.

Versions:


To setup an environment for 32 bit, you can do it in normal x86_64 environment, and use systemd-nspawn --personality=x86

ObserverOfTime commented 2 weeks ago

Fixed in 85e49483b6d2b49953568d95cfa03e539bfb8179. May or may not work on 32bit Windows.

arthurzam commented 2 weeks ago

@ObserverOfTime thank you very much for fast handle. I've applied the fix on Gentoo and it works.

Now we have a small test failure because of this (only 32 bit again):

==================================================================================================== FAILURES =====================================================================================================
_____________________________________________________________________________________________ TestLanguage.test_hash ______________________________________________________________________________________________

self = <tests.test_language.TestLanguage testMethod=test_hash>

    def test_hash(self):
        for name in ["html", "javascript", "json", "python", "rust"]:
            with self.subTest(language=name):
                ptr = getattr(self, name)
                lang = Language(ptr)
>               self.assertEqual(hash(lang), ptr)
E               AssertionError: -163389856 != 4131577440

lang       = <Language id=4131577440, version=14>
name       = 'html'
ptr        = 4131577440
self       = <tests.test_language.TestLanguage testMethod=test_hash>

tests/test_language.py:82: AssertionError
============================================================================================= short test summary info =============================================================================================
FAILED tests/test_language.py::TestLanguage::test_hash - AssertionError: -163389856 != 4131577440
========================================================================================== 1 failed, 58 passed in 0.26s ===========================================================================================
ObserverOfTime commented 2 weeks ago

Good thing I didn't immediately release a patch.

arthurzam commented 2 weeks ago

Good thing I didn't immediately release a patch.

Yeah, for gentoo it is very easy to carry patches and test it, so I've tried to test it as fast as I could to give feedback back :) Once again thank you for handling it.

P.S. I started to look if I can setup a github actions CI for x86, maybe through docker emulation, so we can have the CI verify it for us for the future.

ObserverOfTime commented 2 weeks ago

It's doable through cibuildwheel and used to be a thing but I'd rather not increase the CI execution time for some niche architectures unless there's enough demand.

ObserverOfTime commented 2 weeks ago

Unfortunately the hash method can't be made unsigned, but the test is fixed by d694939. I'll release 0.22.3 now.