jnwatson / py-lmdb

Universal Python binding for the LMDB 'Lightning' Database
http://lmdb.readthedocs.io/
Other
624 stars 102 forks source link

Fails to build with Python 3.13 due to removal of PyObject_AsReadBuffer #362

Open AdamWill opened 2 weeks ago

AdamWill commented 2 weeks ago

Affected Operating Systems

All

Affected py-lmdb Version

All

py-lmdb Installation Method

Irrelevant

Using bundled or distribution-provided LMDB library?

Irrelevant

Distribution name and LMDB library version

Irrelevant

Machine "free -m" output

Irrelevant

Other important machine info

Irrelevant

Describe Your Problem

py-lmdb does not build against Python 3.13 because it uses PyObject_AsReadBuffer, which was removed in Python 3.13. The recommended replacement is PyObject_GetBuffer and PyBuffer_Release.

Errors/exceptions Encountered

    lmdb/cpython.c: In function ‘val_from_buffer’:
    lmdb/cpython.c:586:12: error: implicit declaration of function ‘PyObject_AsReadBuffer’; did you mean ‘PyObject_GetBuffer’? [-Wimplicit-function-declaration]
      586 |     return PyObject_AsReadBuffer(buf,
          |            ^~~~~~~~~~~~~~~~~~~~~
          |            PyObject_GetBuffer

Describe What You Expected To Happen

Successful build.

Describe What Happened Instead

Failed build.

AdamWill commented 2 weeks ago

So I can nearly fix this, but I'm struggling with the lifecycle of the view buffer PyObject_GetBuffer uses. This patch works, but obviously leaks those views:

diff --git a/lmdb/cpython.c b/lmdb/cpython.c
index ef30447..78838b9 100644
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
         type_error("Won't implicitly convert Unicode to bytes; use .encode()");
         return -1;
     }
-    return PyObject_AsReadBuffer(buf,
-        (const void **) &val->mv_data,
-        (Py_ssize_t *) &val->mv_size);
+    Py_buffer view;
+    int ret;
+    ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+    if(ret == 0) {
+        val->mv_data = view.buf;
+        val->mv_size = view.len;
+    }
+    //PyBuffer_Release (&view);
+    return ret;
 }

 /* ------------------- */

If you uncomment the PyBuffer_Release, it crashes, because we aren't actually reading the data here, just setting a pointer to it.

I can't figure out a good way to handle this, because there are a lot of things that call val_from_buffer and they tend not to read the result right away, there's a lot of nesting going on. e.g. parse_args calls parse_arg which calls val_from_buffer, and dozens of things call parse_args and use the data at varying points afterwards...so what? Do we have every single one of those pass in a view and free it when they're done, and pass that view all the way down to val_from_buffer? That feels very icky, but I don't know what would be better. Note I am not any kind of C expert at all, maybe this is obvious to someone else.

jnwatson commented 2 weeks ago

Thanks for the heads up about 3.13. I haven't looked at that yet.

I'm afraid val_from_buffer and its callers will have to be restructured. The first approach I'd try is for the new val_from_buffer to takes a Py_buffer ** parameter that the created buffer is passed back into. Then for every caller of val_from_buffer there'd need to be a val_from_buffer_free that calls PyBuffer_Release on it if it isn't null.

This is a hard API break, and it probably means that the version that implements this will be the first version that doesn't support Python 2.7.

You're welcome to submit a PR that implements this. If not, I'll work on it after I have my current 3.12 branch working.

AdamWill commented 2 weeks ago

Thanks!

That's more or less what I was talking about at the end of my message, I think. But - if I'm understanding things correctly - it's difficult because it doesn't seem like we're always done using the data even within the scope of the immediate caller of val_from_buffer. Like in the parse_arg case - parse_arg calls val_from_buffer, but it doesn't really use the result itself, the thing that uses the result is two or more links further up the chain. So you wind up having to pass the Py_buffer an awfully long way to get it from the place it will ultimately need to be freed all the way to val_from_buffer.

That seemed like a lot of work that would be very easy to get wrong for me, since I don't know the codebase and I'm not at all a C hacker, so I passed on trying to write a PR, sorry!

jnwatson commented 2 weeks ago

NP at all. I appreciate the report.