rogerbinns / apsw

Another Python SQLite wrapper
https://rogerbinns.github.io/apsw/
Other
733 stars 97 forks source link

Be able to create a connection from externally supplied sqlite3 pointer #79

Closed rogerbinns closed 2 years ago

rogerbinns commented 10 years ago

From rogerbinns on October 05, 2009 14:33:35

It should be possible to create a Connection from an externally supplied sqlite3*.

This does bring up issues like knowing when the pointer is no longer valid. See also related issue #28 .

Original issue: http://code.google.com/p/apsw/issues/detail?id=79

rogerbinns commented 10 years ago

From cw10025 on December 03, 2010 06:42:12

I needed to pass in a sqlite3* pointer so I could work with an already opened database. I do this by setting a global variable with a pre-arranged name of "sqlite3_db_handle" with the pointer value. This is workable assuming multiple invocations create a new ThreadState. Attached find my modification of apsw-3.7.3. r1 connection.c

Attachment: connection.c

rogerbinns commented 10 years ago

From rogerbinns on December 03, 2010 08:53:31

There are two issues here. One is how to deal with an outside sqlite3 pointer and the second is how to pass it in.

Dealing with how to pass it in is easy and the patch is not a good way of doing it. Instead I'd just make the routine accept a string filename as is currently done, or a integer object wrapping a pointer.

The first problem of how to deal with an outside pointer is far more difficult. Python C has an object model and object lifetime tracking mechanism. Something coming from outside can just be pulled away at any moment without any notification or control. Additionally any pointer from outside will not be using proper tracking and control over threading which APSW goes to great pains to do correctly.

In other words an external pointer can be easily accepted but is far more likely to result in crashes, corruption, deadlocks or other hard to diagnose issues.

rogerbinns commented 10 years ago

From cw10025 on December 03, 2010 10:02:40

If I don't change the memory that the outside pointer points to until the sub-interpreter is done with it, I don't see the problem. In other words, the "notification and control" is me not changing it in my outside code. Fragile, I know, but assuming I don't change or deleted it until the interpreter instance is done and destroyed, what's the problem?

rogerbinns commented 10 years ago

From rogerbinns on December 03, 2010 10:31:38

You are entirely correct in that if the outside supplier does not use the pointer in any way until Python is finished then there won't be any problems. There is however no way to know that is the case, and the consequences would be very ugly. And if you are going through all that much trouble, why not just have APSW open the database file itself?

You also mentioned sub-interpreters. APSW will only function correctly if you have one Python interpreter in the process. If you have more than one it cannot tell them apart and horrible things will happen. This is a general problem with the C API of Python for using the GIL.

rogerbinns commented 10 years ago

From cw10025 on December 03, 2010 13:03:07

If I use two separate connection handles, won't I run into problems of the schema and/or data being out-of-date due to caching? Also if I begin a transaction in one handle it won't be valid in the other handle, correct? Unless I should use 2PC (that was a joke).

rogerbinns commented 10 years ago

From rogerbinns on December 03, 2010 13:48:29

SQLite is a database engine designed and implemented for concurrent usage between multiple processes on the same machine and even multiple connections within the same process.

You will not confuse it, nor have validity issues, nor corrupt or anything similar as it is fully ACID. See these for full details: http://www.sqlite.org/lockingv3.html http://www.sqlite.org/wal.html If you are somehow trying to share the same information between multiple different places in your process then using SQLite is a truly bizarre approach.

rogerbinns commented 10 years ago

From cw10025 on December 03, 2010 14:30:49

Here is the internal code for "BEGIN TRANSACTION":

sqlite3VdbeAddOp2(v, OP_AutoCommit, 0, 0);

See? it operates on a per-connection basis, so you can't span a transactions across connections. This is the case with any database product I've ever worked with. I am not doing anything bizarre, trust me. See this for more info: http://www.sqlite.org/lang_transaction.html

rogerbinns commented 10 years ago

From cw10025 on December 03, 2010 15:31:02

I think I know what the issue is. I am trying to implement a custom function that will run arbitrary Python code; this works fine. Now, I want to interact with the database inside that custom function, thus my interest in your APSW. Two bad scenarios are occurring. If I pass in the sqlite* then I get a memory error.
If I re-open the database, as you suggest, then my function will run twice, each time with a different parser context, even though I only invoke it once.

This issue, I believe, is that when the shell (sqlite3) is invoked, it loads the libsqlite3 library, then when I load the extension linked with APSW (and thus, also linked with sqlite3.o) it re-initializes the data segment that was in effect after the first time (when libsqlite3 was loaded). Thus any globals would be re-initialized.

So it initially appeared to work, but really, this approach won't work without being able to build APSW without linking with sqlite.o (amalgamation compiled object).

rogerbinns commented 10 years ago

From cw10025 on December 03, 2010 15:36:23

BTW, trying to re-use an already open database connection is implemented in the pysqlite module - actually it accepts a connection object from APSW: http://docs.pysqlite.googlecode.com/hg/sqlite3.html#combining-apsw-and-pysqlite

rogerbinns commented 10 years ago

From rogerbinns on December 03, 2010 15:59:52

You are combining several different things in this discussion. In reference to comment #9 the pysqlite author has actually recently removed that functionality. Even though both APSW and pysqlite use the same Python C object model he started to get concerned.

You haven't specified what programming language your function is. APSW supports SQLite's function mechanism and that code can then do anything including using ctypes to access outside non-Python code, or call code available to Python such as via the Python C api.

The SQLite shell included with SQLite isn't a particularly good thing to use. Think of it more as a debugging aid written by the SQLite developers. APSW does include a separate shell implementation with many of SQLite shell's bugs fixed, more friendly to use, and easily extensible.

The recommended build instructions for APSW bury the amalgamation within so that there are no platform dependencies or potential problems. It will even co-exist with another SQLite in the same process. As an example this happens often on Mac where coredata and some other modules will use an older platform SQLite while APSW can separately use its newer SQLite. They will have their own separate implementations of the SQLite binary code and various data structures will have different layouts as the SQLite authors updated things over time. In this scenario it would be disastrous to share a sqlite3 pointer and why it would appear that "globals are re-initialized".

In circumstances where you want the same code used then you need to do the compilation ensuring linkage against the same SQLite libraries and do not embed SQLite in APSW using the amalgamation.

I suggest you explain from the beginning what you are trying to achieve (not how you are trying to achieve it) and what programming languages the various pieces you already have are written in. I'll be able to help a lot more.

rogerbinns commented 2 years ago

This never really resolved itself. Closing