piskvorky / sqlitedict

Persistent dict, backed by sqlite3 and pickle, multithread-safe.
Apache License 2.0
1.17k stars 131 forks source link

docs: add section on motivation #32

Open ianozsvald opened 9 years ago

ianozsvald commented 9 years ago

This isn't a bug, just a question that maybe could be addressed in the README.

Python ships with https://docs.python.org/3/library/shelve.html, it is a persistent, dictionary-like object backed by a db to disk. How does sqlitedict improve things? Perhaps shelve's storage engine is old and less efficient (larger files? slower access?) than using sqlite3?

https://dataset.readthedocs.org/en/latest/ goes a set further to enable (but not require) SQL-like querys on a dict-like structure, backed by any SQLAlchemy db, with exports to JSON and CSV. The goal is to keep the simplicity of data handling that CSVs provide, without the downsides of CSVs (at least - that's how I understand it!).

Any thoughts on how sqlitedict fit into the picture would be well received.

piskvorky commented 9 years ago

You missed my lightning talk at PyData, didn't you? :)

It's true the motivation should be mentioned in the docs -- I'll update it. Let me rename this ticket and keep it open, lest I forget.

ianozsvald commented 9 years ago

No, whichever PyData you're referring to :-) Video link?

ianozsvald commented 9 years ago

Nudge...I'm genuinely curious. I started to use shelve for a side project and was surprised to see it make 3 files (which is a bit annoying when backing stuff up during early dev) - maybe sqlite3 keeps everything in 1 file? I'm just going to keep on guessing...

piskvorky commented 9 years ago

Ah sorry, missed your comment.

This was PyData Italy in Florence -- the lightning talks (I had two) were not recorded AFAIK :(

Quick answer: shelve & co: too complex, becomes slow with larger data, cannot store large objects... https://docs.python.org/2/library/shelve.html#restrictions

Your dataset library I didn't know, looks interesting. It's not a replacement for dict though -- the syntax is different. If you're OK with committing to a special API, why not just go with MongoDB = excellent Python support.

sqlitedict is for those cases where you're using dict, find out it consumes too much RAM, so switch to a DB backend without having to rewrite everything. Most common scenario: super simple dict-like caching, backed by sqlite. Done with the cache? Just delete a single sqlite file on disk, no mucking around with tables and index structures.

It doesn't need any external DB (sqlite3 is an in-process DB, with built-in support in Python). It adds a few extra features on top, such as allowing multithreaded access (the sqlite3 adaptor in Python would normally just fail), but that's basically it. A single file, simple Python module with a bunch of unit tests, hack-friendly. It doesn't try to cover all DB functionality, going for simplicity instead, on purpose.

ianozsvald commented 9 years ago

Fair points. I'm not sure shelve really has a place any more (I'm experimenting with it but...can't really see the point). dataset can sit on sqlite3 but generally I agree with you - if you've got Mongo already, probably one should stick with that. The obvious bonus of shelve is that it is built-in and well tested and probably most people don't have lots of data or large data to store in it. Cheers!

piskvorky commented 9 years ago

No problem.

Btw can you say a little more about that "shelve produces 3 files" thing? Where/what/how/why? I'll mention that in the docs too, when enumerating all the subtle ways that shelve annoys people :)

ianozsvald commented 9 years ago

shelve writes a .bak .dir .dat, I don't know what they do (.bak .dir are hundreds of bytes)

turicas commented 9 years ago

@piskvorky: and if you are already using MongoDB and want a dict-like interface, there is mongodict. :-)

piskvorky commented 9 years ago

@turicas nice!