msiemens / tinydb

TinyDB is a lightweight document oriented database optimized for your happiness :)
https://tinydb.readthedocs.org
MIT License
6.75k stars 536 forks source link

Refactoring for smart storages #106

Closed neRok00 closed 8 years ago

neRok00 commented 8 years ago

@msiemens Here is the 'fork' of the discussion regarding storages from issue #105. To address your last storage related comments in that issue;

Another option would be shifting the responsibility for creating the Element objects to the storage which (1) would be a breaking change

I don't see how this would be a breaking change? From a users perspective, they ask the database or table object for an element, and it oblidges. It shouldn't matter if the storage creates the element, or if the table/storage proxy does, because an element is going to be returned either way.

As far as I can remember I considered this but chose to not follow this path as it basically introduces redundant information because we already store tables as {id: {element ...}, id: {element ...}}. Also, this too would be a breaking change as the way to access the element id would be an item access, not an attribute access.

Definitely introduces redundant information, but it would also reduce the amount of objects created in getting the fields into an Element class. So there is a trade off. If storages are responsible for making this decision though, a high performance storage option might choose to accept the redundant information, whilst a basic storage like the current JSONStorage might not.

Regardless, this wouldn't have to be a breaking change, because a property can be added to the element that returns the eid field. This is why I suggested leaving the element creation to the storage class, so it can make an element that has an eid attribute accessible by whatever means necessary.

class Element(dict):
    @property
    def eid(self):
        return self['__eid__']

I don't see how Element, StorageProxy and Table are specific to the JSONStorage. If they are, that's not what I intended. Could you elaborate in which way(s) these classes depend on implementation details of JSONStorage?

Now that you have introduced the concept of smart and dumb storages, I guess those classes are specific to dumb storages. They presume that the storage is only capable of writing entire databases at once. They don't allow the storage to update specific tables or elements in other ways (ie smart ways), because they force the dumb approach.

A quick remark on breaking changes: Some of your proposals would require breaking changes which would affect a big share of users. As the popularity of TinyDB grows (>1000 GitHub stars) I'm becoming more and more conservative in that regard. That's not to say that breaking changes are ruled out but I feel like they should bring improvements to the majority of the userbase to be considered worth the upgrade.

Fair enough, but I don't think any of these changes will affect someone using TinyDB in the basic ways outlined on the project home page, for example.


When I wrote the initial version of TinyDB the big innovation if you will was the query interface.

This isn't the place to discuss it, but I'm still scratching my head over what queries achieve compared to just iterating over the results and comparing them directly with normal python conventions. Take an example from the docs;

Fruit = Query()
db.search(Fruit.type == 'peach')

This implies the search will return different fruits, but I don't think it actually does. Fruit is just a confusing variable name. Regardless, the same thing with plain python would just be;

[e for e in db.all() if e.get('type') == 'peach']

And if Elements were changed so that __getattr__() tried to provide the attribute via the dict key, then it could be simplified further to;

[e for e in db.all() if e.type == 'peach']

Take a look at PonyORM for example. They've gone to great lengths to get their syntax to look like the plain python I just used above. Unlike an SQL based database, your queries aren't actually generating SQL statements behind the scenes, so their benefit is lost on me.

msiemens commented 8 years ago

Another option would be shifting the responsibility for creating the Element objects to the storage which (1) would be a breaking change

I don't see how this would be a breaking change? From a users perspective, they ask the database or table object for an element, and it oblidges. It shouldn't matter if the storage creates the element, or if the table/storage proxy does, because an element is going to be returned either way.

It would be a breaking change in that every storage that has been implemented by users would have to be rewritten to follow the transition to smart storages. Now there's the option of supporting both dumb and smart storages and let the storage decide how it wants to be treated but IMO that's way too complex for something called _Tiny_DB.

Regardless, this wouldn't have to be a breaking change, because a property can be added to the element that returns the eid field.

This depends on whether the data layout that is passed to the Storages is considered part of the public API or not. If I'm not mistaken, there's no clear stance on that in the docs but a change of the data layout was considered a breaking change that triggered version 2.0.

Now that you have introduced the concept of smart and dumb storages, I guess those classes are specific to dumb storages. They presume that the storage is only capable of writing entire databases at once. They don't allow the storage to update specific tables or elements in other ways (ie smart ways), because they force the dumb approach.

You're absolutely right. The whole architecture of TinyDB is based on dumb storages which is a trade-off I decided upon very early in the development of TinyDB. Now, revising this decision after almost 3 years of existence is something I'm very cautious about.

Fair enough, but I don't think any of these changes will affect someone using TinyDB in the basic ways outlined on the project home page, for example.

Again, the way third party Storages are implemented would change which I consider a breaking change.

This isn't the place to discuss it, but I'm still scratching my head over what queries achieve compared to just iterating over the results and comparing them directly with normal python conventions. Take a look at PonyORM for example. They've gone to great lengths to get their syntax to look like the plain python I just used above. Unlike an SQL based database, your queries aren't actually generating SQL statements behind the scenes, so their benefit is lost on me.

In my opinion they are more readable than list comprehensions and have the nice property of being composable with OR/AND and nested queries (for example something like db.search(User.groups.any(Group.name == 'admin'))). But really, in a way the whole purpose of TinyDB is providing a layer of syntax magic over a bunch of Python dicts. If people need more serious guarantees (for example in terms of concurrency or performance) I'd point them to more full-fledged solutions (relational or NoSQL databases).

msiemens commented 8 years ago

Closing as moving to a model with smart storages would be a huge breaking change (essentially a rewrite).

@neRok00 Just an idea I had: what do you think about moving TinyDB's query language to a separate project? That way people who like the query language can use it without subscribing to TinyDB's dumb storage model.

neRok00 commented 8 years ago

It hardly seems worth moving the query language off into its own project, but perhaps a section in the docs to show people they can also do 'queries' with plain old python would be beneficial.