msiemens / tinydb

TinyDB is a lightweight document oriented database optimized for your happiness :)
https://tinydb.readthedocs.org
MIT License
6.84k stars 550 forks source link

Support adding callables to path #445

Closed rewritten closed 2 years ago

rewritten commented 2 years ago

Attempt to solve #423. The idea comes from lenses in general, and more specifically from Elixir's Access module and its kernel companion functions get_in and friends.

query = Query().field.map(callable) == expected
query = Query()['field'].map(callable) == expected

Maybe it would be even better to avoid an extra method altogether and use __getitem__:

query = Query().field[callable] == expected
query = Query()['field'][callable] == expected

WDYT?

rewritten commented 2 years ago

@msiemens do you think this solves the mapping problem? Do you want me to further improve test coverage / documentation / anything?

msiemens commented 2 years ago

Thanks @rewritten! This is an elegant implementation, I like it!

Maybe it would be even better to avoid an extra method altogether and use __getitem__:

I think it would be more consistent with how other parts of the Query interface work to use the .map() function that you've already implemented πŸ™‚


There's another question: How do we make sure that lambdas passed to .map() have a consistent hash? For example:

>>> hash(lambda: 1)
275762436

>>> hash(lambda: 1)
275850424

This would mean that a query that uses map would create multiple entries in the query cache for the same lambda. A solution would be to build the hash from the lambdas code:

>>> hash((lambda: 1).__code__)
6824985394236043508

>>> hash((lambda: 1).__code__)
6824985394236043508

But it gets even more complicated: lambdas can references local variables which are not included in the hash. Thus the query results would change but the query's hash would not and the query cache would return old results:

>>> x = 2
>>> hash((lambda: x).__code__)
581326866784499933

>>> x = 20
>>> hash((lambda: x).__code__)
581326866784499933

This is something I don't have a solution for except to somehow exclude queries that use map from the query cache…

rewritten commented 2 years ago

Interesting. I probably would not worry about not being able to reuse a cached lambda for a query with the "same" lambda. On the contrary, rightly because even the same callable might have different effects depending on the state of the universe, I would explicitly avoid caching any callable at all.

x = 1
fn = lambda y: y + x
q = Query().val.map(fn) == 4

# q will return items with val==3

x = 2

# now q will return other items

I'll look into the caching, but I'm quite sure that you don't want to cache the result of callable-based queries.

rewritten commented 2 years ago

Hey @msiemens I have given a try to mark queries as uncacheable when they contain a .map(callable) part, but it is not ideal, because:

This last commit contains the needed changes, but I find it a bit inelegant. If you like it, I'm not opposed to merging, but I feel that there could be a better solution.

rewritten commented 2 years ago

Hey @msiemens have you had an opportunity to see if this code fits your need?

msiemens commented 2 years ago

@rewritten I will have a closer look at your solution in the next two weeks, I'm just a little busy at the moment with the Christmas season πŸ™‚ I'll check if I find a more elegant solution, but I also think that a pragmatic solution absolutely has its merits!

msiemens commented 2 years ago

Thanks for your work! It's much appreciated πŸ™‚

Regarding checking if the query is cachable or not: I think this implementation is fine. After all, explicit is better than implicit and all that πŸ™ƒ

msiemens commented 2 years ago

I'll try to get a new version of TinyDB ready and published this weekend which will include your changes!

msiemens commented 2 years ago

This is now released in TinyDB v4.6.0 πŸ₯³