pygobject / pgi

[Unmaintained: Use PyGObject instead] GTK+ / GObject Introspection Bindings for PyPy.
GNU Lesser General Public License v2.1
74 stars 16 forks source link

Cairo Segfaults #20

Open OceanWolf opened 9 years ago

OceanWolf commented 9 years ago

Running the matplotlib example, cairo segfaults on my machine (debian-jessie)

Program received signal SIGSEGV, Segmentation fault.
in cairo_save () from /usr/lib/x86_64-linux-gnu/libcairo.so.2
(gdb) backtrace
#0  in cairo_save () from /usr/lib/x86_64-linux-gnu/libcairo.so.2
#1  in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#2  in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#3  in ?? () from /usr/lib/python2.7/dist-packages/_cffi_backend.x86_64-linux-gnu.so
#4  in PyObject_Call (kw=<optimized out>, arg=<optimized out>, func=<optimized out>) at ../Objects/abstract.c:2529
#5  do_call (nk=<optimized out>, na=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at ../Python/ceval.c:4251
#6  call_function (oparg=<optimized out>, pp_stack=<optimized out>) at ../Python/ceval.c:4056
#7  PyEval_EvalFrameEx () at ../Python/ceval.c:2679
#8  in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at ../Python/ceval.c:4119
#9  call_function (oparg=<optimized out>, pp_stack=<optimized out>) at ../Python/ceval.c:4054
#10 PyEval_EvalFrameEx () at ../Python/ceval.c:2679
#11 in PyEval_EvalCodeEx () at ../Python/ceval.c:3265

If you want more of the backtrace, let me know (I have 133 lines of it). Other people who have tried to use it (or similar code) have got:

BUG: PySequence_LengthSystemError: null argument to internal routine

or

.../cairo-1.10.2/src/cairo.c:173: _cairo_error: Assertion `(status != CAIRO_STATUS_SUCCESS && status <= CAIRO_STATUS_LAST_STATUS)' failed
lazka commented 9 years ago

Thanks. Yeah, lots of things missing there.. that's also why the GC gets disabled in the example to make it at least kinda work.

OceanWolf commented 9 years ago

Well I see a figure window pop up with gdb it stays, without the segfault crashes python and so it closes almost immediately. With gdb I don't see a plot, just an empty window...

Would using python2.7 instead of pypy affect this?

By the way let me introduce myself, I collaborate over on the matplotlib project, and one of my recent PRs incorporates pgi as an alternative for those who want to use (internally I want to use it to test the GTK3 backend), I pinged you over on that PR around 9 days ago.

Even if pgi doesn't work for mpl regular usage, I still want to use it as it helps with testing the MPL codebase, but it would feel nice to document something in MPL that kind of works :).

OceanWolf commented 8 years ago

hmm, just twigged whether with "lots of things missing there" you referred to pgi (as I had assumed) or to the gdb trace I sent.

Hopefully this week I will figure out how to get a fuller stack trace for you... slowly learning how to use gdb.

If I can help I will.

OceanWolf commented 8 years ago

right, now hot on the tail of this bug, no idea whether this comes from pgi, matplotlib or cairo[cffi], or some odd combination because in isolation they all seem to work fine... but I have noticed some peculiarities, or what looks like peculiarities... I shall hopefully have this figured out in the next day or two.

OceanWolf commented 8 years ago

@lazka FOUND IT!

Okay, I have found that commenting out these lines in backends/backend_gtk3cairo.py seems to fix it (and the same lines in backend_gtk3agg.py):

if HAS_CAIRO_CFFI:
  ctx = cairo.Context._from_pointer(
    cairo.ffi.cast('cairo_t **', id(ctx) + object.__basicsize__)[0],
    incref=True)

That call passes, with the segfault occurs after this line has executed, so not exactly sure why commenting this out allows it to work... maybe it circumvented some logic in pgi... I noticed that before calls from pgi would return a <cdata 'struct _cairo *' 0x...> instead of a ``<cdata 'void *' 0x...> which pgi normally returns.

Next step, as a cairo[cffi] noob, I shall find out why we put those lines in, and if you have any ideas why those lines break the pgi implementation... this should help us decide on whether we need to fix pgi or fix matplotlib or both.

OceanWolf commented 7 years ago

Update:

So, as I see the problem, pgi auto converts cairo.Context to cairocffi.Context whereas with normal cairocffi you have to manually convert as the matplotlib code does in the code above, so it tries to convert an already converted context, and hence causes memory problems and a segfault.

Does that sound plausible as an explanation? Can I test that? Can we change that auto-conversion, or does that undermine the basis of pgi by introducing non-cffi objects. If so then I will change the matplotlib codebase, either with a simple if USES_PGI == False: convert, or better if if not is_instance(ctx, cairocffi.context.Context): # convert to cffi.

lazka commented 7 years ago

yes, pygobject uses pycairo while pgi uses cairocffi. The code in matplotlib assumes it gets a pycairo object.

We could try to use pycairo when running under CPython, other than that, no idea. The code in matplotlib could check if the given object already is a cairocffi one before trying to convert it, but given the limited use of pgi I don't think that it's worth it.

I've successfully used older matplotlib versions, so maybe try with something released ~2014/15

OceanWolf commented 7 years ago

Support for cairocffi (and thus those lines) were added to master the 17th January 2014 and released with version 1.4.0 (rc1 on the 13th July and several RCs later, officially released on 26th August), I see the last change to matplotlib_example.py on the 11th July 2014, so if you never tested it after then, then perhaps that explains it. It worked then because we didn't support cffi back then but pgi magically converted it for us without any change to the mpl codebase.

My feelings on which way to use:

I don't think the limited use of pgi now makes much of a difference, instead I find the future use more pertinent. Do you see pgi becoming stable and more usable and which way do you want to take it? In terms of compatibility, I would expect at least a list of places where a concious decision has been made to deviate from the goal of 100% compatibility, i.e. that gtk widgets now return cairocffi contexts instead of cairo contexts.

lazka commented 7 years ago

pgi using cairocffi instead of pycairo means a break in BC, i.e. if you intend for pgi to work exactly the same and interchangeably from pygobject, then this does not meet that goal. It says on the readme...

Casting a private Python object field to some pointer can hardly be called valid use of an API. pycairo is a CPython extension and can't be used with PyPy, so there is not much choice there..

I don't think the limited use of pgi now makes much of a difference, instead I find the future use more pertinent. Do you see pgi becoming stable and more usable and which way do you want to take it?

pgi is currently pretty low on my priority list and I only use it for doc generation atm. I've focused on pygobject instead and also ported some speed improvements from pgi to pygobject in the meantime.

refi64 commented 7 years ago

@lazka Would you take PRs?

OceanWolf commented 7 years ago

Casting a private Python object field to some pointer can hardly be called valid use of an API

Fair point

I only use it for doc generation atm

Right now I have no reason not to use gobject, but I like pgi as it feels less messy, and hence I find it perfect to use for running matplotlib tests that run gtk. In the future, once it becomes more stable, I don't see any reason why it couldn't replace pygobject.

Current testing shows it works fine with mpl with that modification. Commenting out the lines that disable the gc works fine as on all combinations of the cairo vs agg backends and py3 vs py2 apart from cairo backend with py2, in which case I get a Segmentation Fault. I will now add this fix to my pgi mpl PR.