Open AdamWill opened 2 weeks ago
So I can nearly fix this, but I'm struggling with the lifecycle of the view buffer PyObject_GetBuffer
uses. This patch works, but obviously leaks those views:
diff --git a/lmdb/cpython.c b/lmdb/cpython.c
index ef30447..78838b9 100644
--- a/lmdb/cpython.c
+++ b/lmdb/cpython.c
@@ -583,9 +583,15 @@ val_from_buffer(MDB_val *val, PyObject *buf)
type_error("Won't implicitly convert Unicode to bytes; use .encode()");
return -1;
}
- return PyObject_AsReadBuffer(buf,
- (const void **) &val->mv_data,
- (Py_ssize_t *) &val->mv_size);
+ Py_buffer view;
+ int ret;
+ ret = PyObject_GetBuffer(buf, &view, PyBUF_SIMPLE);
+ if(ret == 0) {
+ val->mv_data = view.buf;
+ val->mv_size = view.len;
+ }
+ //PyBuffer_Release (&view);
+ return ret;
}
/* ------------------- */
If you uncomment the PyBuffer_Release
, it crashes, because we aren't actually reading the data here, just setting a pointer to it.
I can't figure out a good way to handle this, because there are a lot of things that call val_from_buffer
and they tend not to read the result right away, there's a lot of nesting going on. e.g. parse_args
calls parse_arg
which calls val_from_buffer
, and dozens of things call parse_args
and use the data at varying points afterwards...so what? Do we have every single one of those pass in a view
and free it when they're done, and pass that view
all the way down to val_from_buffer
? That feels very icky, but I don't know what would be better. Note I am not any kind of C expert at all, maybe this is obvious to someone else.
Thanks for the heads up about 3.13. I haven't looked at that yet.
I'm afraid val_from_buffer
and its callers will have to be restructured. The first approach I'd try is for the new val_from_buffer to takes a Py_buffer **
parameter that the created buffer is passed back into. Then for every caller of val_from_buffer
there'd need to be a val_from_buffer_free
that calls PyBuffer_Release
on it if it isn't null.
This is a hard API break, and it probably means that the version that implements this will be the first version that doesn't support Python 2.7.
You're welcome to submit a PR that implements this. If not, I'll work on it after I have my current 3.12 branch working.
Thanks!
That's more or less what I was talking about at the end of my message, I think. But - if I'm understanding things correctly - it's difficult because it doesn't seem like we're always done using the data even within the scope of the immediate caller of val_from_buffer
. Like in the parse_arg
case - parse_arg
calls val_from_buffer
, but it doesn't really use the result itself, the thing that uses the result is two or more links further up the chain. So you wind up having to pass the Py_buffer
an awfully long way to get it from the place it will ultimately need to be freed all the way to val_from_buffer
.
That seemed like a lot of work that would be very easy to get wrong for me, since I don't know the codebase and I'm not at all a C hacker, so I passed on trying to write a PR, sorry!
NP at all. I appreciate the report.
Affected Operating Systems
All
Affected py-lmdb Version
All
py-lmdb Installation Method
Irrelevant
Using bundled or distribution-provided LMDB library?
Irrelevant
Distribution name and LMDB library version
Irrelevant
Machine "free -m" output
Irrelevant
Other important machine info
Irrelevant
Describe Your Problem
py-lmdb does not build against Python 3.13 because it uses
PyObject_AsReadBuffer
, which was removed in Python 3.13. The recommended replacement isPyObject_GetBuffer
andPyBuffer_Release
.Errors/exceptions Encountered
Describe What You Expected To Happen
Successful build.
Describe What Happened Instead
Failed build.