jpype-project / jpype

JPype is cross language bridge to allow Python programs full access to Java class libraries.
http://www.jpype.org
Apache License 2.0
1.12k stars 183 forks source link

fix: Python 3.12 support; remove usage of wstr and use updated PyLong ob_digit location #1148

Closed Christopher-Chianelli closed 11 months ago

Christopher-Chianelli commented 1 year ago

Investigating a segmentation fault with PyJP_GetAttrDescriptor

C  [libpython3.12.so.1.0+0x1f2882]  PyDict_GetItem+0x22
C  [_jpype.cpython-312-x86_64-linux-gnu.so+0x8df43]  PyJP_GetAttrDescriptor+0x73
C  [_jpype.cpython-312-x86_64-linux-gnu.so+0x87758]  PyJPClass_setattro+0x98
Christopher-Chianelli commented 1 year ago

Fixes #1147 when it actually works, but this make the code compilable at least.

cclauss commented 1 year ago

Please add the Python 3.12 continuous integration tests from #1153 so we can monitor progress.

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (7363ce8) 87.84% compared to head (f9ade3d) 87.82%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1148 +/- ## ========================================== - Coverage 87.84% 87.82% -0.02% ========================================== Files 112 112 Lines 10276 10276 Branches 4032 4032 ========================================== - Hits 9027 9025 -2 - Misses 698 699 +1 - Partials 551 552 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Christopher-Chianelli commented 1 year ago

There are quite a few non-trivial changes to the internals of CPython 3.12 that JPype uses, which unfortunately makes supporting CPython 3.12 far from trivial. Currently trying to fix the segmentation fault on testGetStaticByInstance. I am currently working on this issue in my free time, so progress will be slow; don't expect it anytime soon unless someone else does it.

Thrameos commented 1 year ago

I am trying to cycle back to it after not having a computer for last 3 months.

marscher commented 12 months ago

I am trying to cycle back to it after not having a computer for last 3 months.

What the heck. That's a long time I have to say. Did you take some sabbath? :+1:

marscher commented 11 months ago

@Christopher-Chianelli I advice you to use pytest-xdist and invoke pytest -n2 to enable all tests to run, even if there are crashes. I already pushed this to CI.

Here is an overview of all failed tests: https://dev.azure.com/jpype-project/jpype/_build/results?buildId=1129&view=ms.vss-test-web.build-test-results-tab

Thrameos commented 11 months ago

@marscher Sadly my work laptop that held everything that I work with JPype on took a dive mid July. I ordered one straight out of my work catalog using funding set to expire on Oct 1 figuring that I would be up and ready to go in August. But as it was a "second computer" requiring an extra round of approvals and the vendor lowered the item price which triggered another round of signatures, then the funding expired so I had to restart the process again, and when it finally did show up the computer support needed two weeks to get it ready for use, then it was discovered the vendor had substituted the hardware for something not approved so no use at work. (Then of course with new fiscal year my available time was at zilch until December.) NEVER underestimate the mind boggling time it can take the US government can take to order a laptop.

So here is the current state of Python3.12.

1) They completely reworked the PyLong memory model packed with all the edge cases of string. And as they still have not added a "PyLong_NewWithSubtype" that means I have to replicate the whole set of private functions required to make that work. The main difference is that they aren't misusing the size field quite as much which was in units of digits but now is the bytes in the representation minus the header that got added for all those representation types.
2) The PyString which was already a bear was changed to get rid of wstr representation. 3) They finished gutting the PyObject_NewGC in a way that now precludes the ability to use alloc to add extra memory. They do have a new replacement.

I have puzzled out the first. The second compiles but needs lots of testing. The third is the nastiest as until I manage to get it exactly right I will get get random memory corruption problems (which makes locating the source of the error very painfully slow.)

Thrameos commented 11 months ago

@encukou Bad news.... the API for extension of objects added in 3.12 appears to be a significant miss in terms of what is required for JPype.

There are two reasons for this.

1) The JPClassType is used as a Meta class which derives from PyType and contains a significant amount of non-trivial code. The new method gets called from within the module which passes a special flag which indicates that it is a safe operation. If the user attempts to extend a Java class from within Python it fails with an error (which is for safety because you can't create a new Java class in Python). 2) Second it adds to the basicsize of the object which defeats the point of the hidden reserved space. If you add to the basic size and then derive a new type say PyException then you will get the classic problem that Python objects can only be single inherited. JPype was using true multiple inheritance. That is it derives from multiple Python types that have layouts which would otherwise be in complete conflict because the reserve space is uses was a floating offset modeled after dictoffset. If we add to the basicsize of the object rather than having the concept of a user reserve that floats just like dict and weak references, then you can't use this for multiple inhertance.

Unfortunately this places JPype in a seriously bad spot for 3.12. The existing "tp_alloc" hack is currently breaking because something wrote to the space after the allocation. And the API intended to be the replacement is short of what is required for universal multiple inheritance.

Is the GC now using the memory after the object or is there a bug someplace that could write into that location?

Thrameos commented 11 months ago

Hmmm. The plot thickens.

I search through and found that the Python garbage collection is not touching the memory after the allocation of the object. So whatever is messing with our memory reserve is not part of the GC system. That means that either a class in Python proper is corruptly writing into the area outsize of the expected space or something in one of classes (perhaps the attempt to replicate PyLong). Python always has a set of extra bytes at the end due to "itemsize+1" code, so something may be pushing into that left over space in an undetected fashion. I just need to find out which class touched it and build a workaround and we are back up an running.

Thrameos commented 11 months ago

Okay I believe I have tracked this one to its source. Nothing was writing to my reserve space. Instead something clobbered all methods needs to locate the where the end of the structure is.

The issue is that PyLong is acting like a VAR_Object but it no longer functions as one. Thus it allocates a var object with size 1 then resets the size field to a bit flag preventing us from figuring out how much memory it takes which is the only way to know how much memory we need to get to our reserve.

This seems like a horrible abuse for the API to create something with a VAR type and then clobber the ob_size field to save some bytes which makes locating the the original number of bytes that were used in the allocation impossible. I will have to come up with a kludge to deal with this sort of abusive behavior by fixing the size of the object to something that I can get to.

Thrameos commented 11 months ago

After many hours (okay not that many, but it felt like forever) I have found a kludge around PyLong. It is nasty because ob_size has its meaning change during operation from ob_size to lv_tags with different meanings. The user data replacement API appears to be a dud because it doesn't provide multiple inheritance of dissimilar bases. Unfortunately that still leaves 3 errors in the tests.

1) Something has changed in the buffer API such that our types are no longer being recognized as having the buffer API implemented. 2) Some random tests are failing on the range of bytes meaning that I still don't have the layout for the integers perfect. 3) I have to find some way to get the same code that works on 3.12 to compile with earlier versions of Python where this level of kludges are not needed.

@encukou I looked deeper on the type expansion. I don't think that it would be hard to make the API work. The trick would be to instead of adding to the tp_basesize on a negative request, to simply tally the number of "end" bytes that the user is requesting and give a "user_offset" much like dict_offset to place the item safely in memory structure where it can be retrieved. Then add the memory reserve to the allocation like JPype does. The same API calls could retrieve it directly.

Though if you wanted to be more general, the better solution would be to allow for multiple negative basesize values and have the type contain an offset so that it can check the table based on the multiple inheritance. Though that again gets into the prospect of conflicting baseclasses one inherits from multiple classes using negative offsets. So still a very hard problem.

encukou commented 11 months ago

Right, a var object is free to use ob_size however it wants. A discussion here won't change that. You could make a topic at https://discuss.python.org/c/30. (But things are very unlikely to change -- the bits are stored here in order to make int faster.)

The new API in 3.12 deals with types like PyType, but not types like PyLong.

Looks like the solution (as with many of these issues) will be to implement Mark's Grand Unified Python Object Layout, which has true multiple inheritance, clarity in who owns what, and optimization possibilities.

Thrameos commented 11 months ago

@encukou I think the ob_size is mostly a red herring. The issue is that there is still not a way to request size for a memory reserve that doesn't cause classes that inherit from it to interfere. (or even know that there is memory associated with the base class). The only reason that I pay any attention to ob_size field is that there isn't a reserve memory for baseclass and retrieve memory from baseclass method which doesn't change the memory layout.

I looked over the grand layout plan and it seems like a mostly breaking lift, though perhaps I am misreading. I will give it a large read when I have had more sleep.

I think there is a very simple implementation for multiple inheritance base on what you included. If a class wants to add opaque memory then it requests negative base size. The negative sizes go into the reserve request which will become an address offset like dict. The offset for that memory based on the mro position. There is a slot __cast__ for looking up memory offsets. In the default implementation the fast mro lookup when memory is requested it checks to see if the position of the class is in the mro matches what is expected (O(1)), then returns the extra offset location. For a class with mixed baseclasses which multiple opaque types the slot has to consult a hashtable (like Java interface casting) or use the address offset from the base class to split tables (like C++ multiple inheritance). Note that the slot is in the derived class. That way classes that have simple orders but then end up being mixed in with others later. The memory management is always the responsibility of the most derived class.

The result is it get the opaque data from a data structure would call void* PyObject_CastToData(PyObject* obj, PyTypeObject* type) . How the memory is organized behind the scene would be up to Python.

The only difficulty with this implementation is that you end use with a table of offsets the size of the mro in classes that use the opaque type. OTOH It could also save a huge amount of memory in the type system as a type could inherit a new slot set (Type vs ArithmeticType vs MapType) so that every slot does not need to add to the heap memory profile. You just call the cast on the type object for the slot interface that is being requested and it gives the slot table. Of course then you have burden that types become mixins, which may not be the direction of Python. And you have to decide if mixings are virtual or not (C++ hell). The system I describe only supports pure virtual mixins.

Thrameos commented 11 months ago

I just realized I dont understand something. If ob_size is freely allowed to be reused, the how does negative dictoffset work? I modelled our reserve memory off dictoffset so if my implementation busts then wouldnt dictoffset also be busted on PyLong? Need to investigate.

Thrameos commented 11 months ago

@encukou Eeeek! I just found the same bug that was haunting me in the Python 3.12 code. Assuming I am reading this right if you create a class derived from int (ie class Foo(int): pass) and then try to use the dictionary it will call this code....

PyObject **
_PyObject_ComputedDictPointer(PyObject *obj)
{
    PyTypeObject *tp = Py_TYPE(obj);
    assert((tp->tp_flags & Py_TPFLAGS_MANAGED_DICT) == 0);

    Py_ssize_t dictoffset = tp->tp_dictoffset;
    if (dictoffset == 0) {
        return NULL;
    }

    if (dictoffset < 0) {
        assert(dictoffset != -1);

        Py_ssize_t tsize = Py_SIZE(obj);  <====  This wont work because PyLong has blown away ob_size
        if (tsize < 0) {    <== These lines were for the old version of PyLong in which -size was used to mean a negative number 
            tsize = -tsize;
        }
        size_t size = _PyObject_VAR_SIZE(tp, tsize);
        assert(size <= (size_t)PY_SSIZE_T_MAX);
        dictoffset += (Py_ssize_t)size;

        _PyObject_ASSERT(obj, dictoffset > 0);
        _PyObject_ASSERT(obj, dictoffset % SIZEOF_VOID_P == 0);
    }
    return (PyObject **) ((char *)obj + dictoffset);
}

Thus the dictionary position will end up way beyond the end of the object into someone else's memory.

Thrameos commented 11 months ago

I have a working copy of JPype for Python 3.12. Next up it to work again with Python 3.10.

encukou commented 11 months ago

I think there is a very simple implementation for multiple inheritance base on what you included. The only difficulty with this implementation is that [... context omitted ...] you have to decide if mixings are virtual or not

Well, PEPs welcome... But it does seem like a real implementation would be as complex as the grand layout plan (and possibly quite similar to it, in the end).


if you create a class derived from int (ie class Foo(int): pass) and then try to use the dictionary it will call this code

Thanks for digging into that one! I guess it's another case of int optimizations invalidating assumptions :(

Doing this with class Foo(int): pass won't crash, because the Python statement will now use Py_TPFLAGS_MANAGED_DICT rather than tp_dictoffset. But it does crash when subclassing in C with negative tp_dictoffset, and I don't see why that should not be valid.

I filed https://github.com/python/cpython/issues/112743

Thrameos commented 11 months ago

Regarding the PR how limited in scope do you want?

1) Moving data requested by negative basesize in front of the object like MANAGED_DICT. In which case there can only be one, but otherwise no API breakage.

2) Moving var objects to start with offset which allows non-trivial basesize to be mixed with var objects like long and str. Minor breakage in Python, but known amounts in other packages. (May not solve my issue because conflict with exception, but solvable another way.)

3) Adding look up for general mixing of opaque types. Requires additional tables to support lookup in the class structure if a class has more than one opaque types. Requires API that can request which opaque type is for the offset. API change now and potentially allows for fancy things at the cost of complexity.

Which is your preference? I have finally gotten permission to sign the form.

Christopher-Chianelli commented 11 months ago

Closing since #1158 appears to fix the issue

Thrameos commented 11 months ago

@encukou Can you give me some insight as to which PR would be most acceptable in terms of API breakage and rearrangement? To fact that JPype currently requires access to Python private methods as opposed to using public API is currently the largest maintenance issue here, especially if my job decides that we don't have enough funding to support JPype (which it often does). I would like to get a few PR in that improves this situation for the long term.

encukou commented 11 months ago

Sorry, I now realize “PEPs welcome” looks too much like “PRs welcome” :(

I think this'll need a PEP -- a change proposal listing the pros and cons of possible solutions. There are too many details and people involved. And while your proposals look simple, but I don't think they'll be that simple in practice.

  1. What happens if there is more than one? That can't break (otherwise e.g. changing a base class from absolute tp_basicsize to negative would be a breaking change). Would there be a flag to request moving data to the front? Or would this happen automatically?
  2. I'd really like to avoid breakage, even “known amounts”. (I don't speak for CPython as a whole, of course...)
  3. This sounds like the grand unified layout, which still looks like the best solution.
Thrameos commented 11 months ago

Okay so I should flesh out the solution that allows for more than one (true multiple inheritance). This could be a reasonable step toward the grand unified layout as fields and the rest can be added after the multicast.

The basic idea will be negative basesize is used to request opaque memory for an object, and will be retrieved with PyObject_GetClassData(PyObject* obj, PyTypeObject* whichClass), no API breakage is allowed. The memory will all be added in front of the object like managed dict.

I will save detailed discussion until I have a better idea of what is required to pull it off. Unless I create a trial PR it won't be clear where all the cogs need to fit.