Open ianozsvald opened 9 years ago
You missed my lightning talk at PyData, didn't you? :)
It's true the motivation should be mentioned in the docs -- I'll update it. Let me rename this ticket and keep it open, lest I forget.
No, whichever PyData you're referring to :-) Video link?
Nudge...I'm genuinely curious. I started to use shelve
for a side project and was surprised to see it make 3 files (which is a bit annoying when backing stuff up during early dev) - maybe sqlite3
keeps everything in 1 file? I'm just going to keep on guessing...
Ah sorry, missed your comment.
This was PyData Italy in Florence -- the lightning talks (I had two) were not recorded AFAIK :(
Quick answer: shelve & co: too complex, becomes slow with larger data, cannot store large objects... https://docs.python.org/2/library/shelve.html#restrictions
Your dataset
library I didn't know, looks interesting. It's not a replacement for dict though -- the syntax is different. If you're OK with committing to a special API, why not just go with MongoDB = excellent Python support.
sqlitedict
is for those cases where you're using dict, find out it consumes too much RAM, so switch to a DB backend without having to rewrite everything. Most common scenario: super simple dict-like caching, backed by sqlite. Done with the cache? Just delete a single sqlite file on disk, no mucking around with tables and index structures.
It doesn't need any external DB (sqlite3 is an in-process DB, with built-in support in Python). It adds a few extra features on top, such as allowing multithreaded access (the sqlite3 adaptor in Python would normally just fail), but that's basically it. A single file, simple Python module with a bunch of unit tests, hack-friendly. It doesn't try to cover all DB functionality, going for simplicity instead, on purpose.
Fair points. I'm not sure shelve
really has a place any more (I'm experimenting with it but...can't really see the point). dataset
can sit on sqlite3
but generally I agree with you - if you've got Mongo already, probably one should stick with that. The obvious bonus of shelve
is that it is built-in and well tested and probably most people don't have lots of data or large data to store in it. Cheers!
No problem.
Btw can you say a little more about that "shelve produces 3 files" thing? Where/what/how/why? I'll mention that in the docs too, when enumerating all the subtle ways that shelve
annoys people :)
shelve writes a .bak
.dir
.dat
, I don't know what they do (.bak
.dir
are hundreds of bytes)
@piskvorky: and if you are already using MongoDB and want a dict-like interface, there is mongodict. :-)
@turicas nice!
This isn't a bug, just a question that maybe could be addressed in the README.
Python ships with https://docs.python.org/3/library/shelve.html, it is a persistent, dictionary-like object backed by a db to disk. How does sqlitedict improve things? Perhaps shelve's storage engine is old and less efficient (larger files? slower access?) than using sqlite3?
https://dataset.readthedocs.org/en/latest/ goes a set further to enable (but not require) SQL-like querys on a dict-like structure, backed by any SQLAlchemy db, with exports to JSON and CSV. The goal is to keep the simplicity of data handling that CSVs provide, without the downsides of CSVs (at least - that's how I understand it!).
Any thoughts on how sqlitedict fit into the picture would be well received.