pygobject / pgi

[Unmaintained: Use PyGObject instead] GTK+ / GObject Introspection Bindings for PyPy.
GNU Lesser General Public License v2.1
74 stars 16 forks source link

Bug in code generation: PopplerRectangle** [out] arg mistreated #4

Closed pwaller closed 9 years ago

pwaller commented 11 years ago

Code to reproduce:

(assumes you have gir1.2-poppler-0.18 available.)

from pgi.repository import Poppler as P

doc = P.Document.new_from_file('file:///path/to/a/.pdf', '')
p = doc.get_page(0)
ok, l = p.get_text_layout()

Actual output: l is a list of integers, which don't seem to be valid mapped memory addresses

Expected: l is a list of pointers to P.Rectangle which gets freed when l is garbage collected.

I'm currently trying to understand what it would take to fix this and implement a fix. I've gotten as far as finding how the code is generated for p.get_text_layout.

If you don't hear from me in the near future assume I gave up.

pwaller commented 11 years ago

It looks like this was already on the cards. However, I cannot see any way to represent an array of Poppler.Rectangle, currently. I suppose I could build a list of Poppler.Rectangle using a from_address on each one, since I know where they are. It feels a little wasteful though.

class _Structure(BaseStructure):
    @classmethod
    def from_address(cls, address):
        obj = cls.__new__(cls)
        obj._obj = address
        return obj
pwaller commented 10 years ago

This situation is improved with recent changes to pgi, we now get a PopplerRectangle, correctly, however when you use one of them, this happens:

In [41]: l = p1.get_text_layout()

In [42]: len(l.rectangles)
Out[42]: 1211

In [43]: l.rectangles[0]
Out[43]: <Rectangle structure at 0x1ca1850 (PopplerRectangle at 0x4075d4089a027525)>

In [44]: r = l.rectangles[0]

In [45]: r
Out[45]: <Rectangle structure at 0x1ca1850 (PopplerRectangle at 0x4075d4089a027525)>

In [46]: r.
r.copy  r.free  r.new   r.x1    r.x2    r.y1    r.y2    

In [46]: r.x1
Segmentation fault
pwaller commented 10 years ago

I've had a bit more insight (thanks @drj11!). The <Rectangle structure at 0x1ca1850 (PopplerRectangle at 0x4075d4089a027525)> is treating the rectangle's first double value as a pointer, i.e, it's doing an unnecessary dereference. unpack("d", pack("Q", 0x4075d4089a027525)) -> (349.2521,), which is the first value in the struct.

Now I'm just not sure if it is a problem with the pgi code generation, or the Poppler GIR package.

lazka commented 10 years ago

Looks relevant: https://bugs.freedesktop.org/show_bug.cgi?id=77790

btw. you can use p.get_text_layout._code.pprint() to see the generated code for the function.

pwaller commented 10 years ago

Nice find with the freedesktop bug.

That pprint output is very pretty! :)

I still can't tell if the problem is with the contents of the GIR itself or PGI and GI's handling of it, which seems equally possible.

The GIR XML definition is produced below.

The parameter is really an out PopplerRectangle *[len(array)] but it is being treated by PGI as a PopplerRectangle **[len(array)].

The emitted code for processing the out parameter is as so:

    for t13 in t12[:t3.value]:
        if t13:
            t15 = e2.object.__new__(e14)
            t15._obj = t13
        else:
            t15 = None
        t16.append(t15)

In this case t13 would need to be a pointer to an individual Rectangle, but currently holds the value at the Rectangle.

Currently the return type is a list of PopplerRectangle pointers, but I was thinking I would prefer if it were a ctypes array of PopplerRectangle (of the correct length) which when it has no references remaining pointing at it is g_free'd.

Do you have any thoughts on this?

      <method name="get_text_layout"
              c:identifier="poppler_page_get_text_layout"
              version="0.16">
        <doc xml:space="preserve">Obtains the layout of the text as a list of #PopplerRectangle
This array must be freed with g_free () when done.

The position in the array represents an offset in the text returned by
poppler_page_get_text()</doc>
        <return-value transfer-ownership="none">
          <doc xml:space="preserve">%TRUE if the page contains text, %FALSE otherwise</doc>
          <type name="gboolean" c:type="gboolean"/>
        </return-value>
        <parameters>
          <instance-parameter name="page" transfer-ownership="none">
            <doc xml:space="preserve">A #PopplerPage</doc>
            <type name="Page" c:type="PopplerPage*"/>
          </instance-parameter>
          <parameter name="rectangles"
                     direction="out"
                     caller-allocates="0"
                     transfer-ownership="container">
            <doc xml:space="preserve">return location for an array of #PopplerRectangle</doc>
            <array length="1" zero-terminated="0" c:type="PopplerRectangle**">
              <type name="Rectangle" c:type="PopplerRectangle*"/>
            </array>
          </parameter>
          <parameter name="n_rectangles"
                     direction="out"
                     caller-allocates="0"
                     transfer-ownership="full">
            <doc xml:space="preserve">length of returned array</doc>
            <type name="guint" c:type="guint*"/>
          </parameter>
        </parameters>
      </method>
pwaller commented 9 years ago

Tidying up my personal issues list, so closing this. Please create a new issue if you're still interested in tracking it.