graehl / pygr

Automatically exported from code.google.com/p/pygr
0 stars 0 forks source link

Move from shelve to sqlite3 storage? #52

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
shelve has caused us problems over and over, due to incompatible formats,
poor scalability etc.  Now Python 2.6 and 3 have completely broken shelve
support, by removing bsddb from the standard library.  That means shelve
will no longer be able to read data from earlier versions of Python, where
shelve uses bsddb storage.  It seems like the only way to end the pain is
to stop relying on shelve.  

On the positive side, sqlite3 is now included in the standard library (as
of python 2.5).  Hopefully it provides decent scalability, in addition to
the much greater capabilities of an SQL database.  Ideally, this would
permit us to standardize most storage as SQL databases, accessed using our
standard classes.  This could simplify a lot of code.  This also suggests
that our code wouldn't care where the storage is: in an sqlite3 file vs. in
a MySQL server or some other relational database.  That opens up a lot of
possibilities for moving data around flexibly.

Questions:
- is sqlite3 genuinely scalable??  Someone should test the hell out of it.
 The poor scalability of default shelve came as a nasty surprise, which I
wouldn't like to repeat on another Standard Library storage component.

- this raises annoying platform issues, since sqlite3 only comes standard
in Python 2.5 and later.  A lot of people still use 2.3 or 2.4...  Is it
easy to install pysqlite / sqlite3 on those platforms?

- of course, we're stuck with providing backwards compatibility with shelve
data.  Presumably that means we retain our old capabilities, but only add
new capabilities to the SQL based components: i.e. old data, older feature
set; new data, new & improved feature set.

Original issue reported on code.google.com by cjlee...@gmail.com on 12 Dec 2008 at 10:27

GoogleCodeExporter commented 8 years ago

Original comment by cjlee...@gmail.com on 12 Dec 2008 at 10:27

GoogleCodeExporter commented 8 years ago
Sqlite is usually pretty scaleable. I, and other people, have used it for 
Gigabytes
of data. It can even, on a limited scale, access the same database from 
different
processes parallely.

Another interesting database for persistence is Apache's Couchdb. This is even 
easier
to use than Sqlite, even though it requires a server instance and a connection 
to
this instance, but it is extremely scaleable.

If you only need temporary storage, you might want to just create temporary 
files,
pickled via the standard pickler, or json, or yaml. This adds the extra work of
keeping tabs on these files, but solves any compatibility issues about shelve. 

Original comment by AndreasK...@gmail.com on 28 Dec 2008 at 3:41

GoogleCodeExporter commented 8 years ago
Andreas, see 
http://groups.google.com/group/pygr-dev/browse_thread/thread/cd06c5a9f7107881

for discussion.

Original comment by the.good...@gmail.com on 4 Jan 2009 at 11:05

GoogleCodeExporter commented 8 years ago
See this thread

http://groups.google.com/group/pygr-dev/browse_thread/thread/80e32dbccae70bcc#

for another discussion indirectly related to this subject. In short: at present 
Pygr 
is virtually unusable on Mac OS X with stock Python unless special steps are 
taken 
to make it work. Since implementing the sqlite back-end ought to take care of 
the 
dbm problem without having to wait for either Apple or Python devs to fix their 
respective bugs, we have decided to escalate the present issue - the necessary 
code 
is to be included in pygr-0.8.1.

Original comment by mare...@gmail.com on 25 Aug 2009 at 12:30

GoogleCodeExporter commented 8 years ago
I've taken the sqlite3 SQLHash code from http://bugs.python.org/issue3783 and
converted it to be Python 2.x compatible; I tested it out on another project 
and it
works fine.

Note, you need to use 'dbsqlite.open_shelf' to open a shelf with it; and the 
mode
characters don't work properly.

I attach it here to avoid duplication of effort!  Down the road I can try 
integrating
it into pygr.

Original comment by the.good...@gmail.com on 30 Aug 2009 at 3:14

Attachments: