msiemens / tinydb

TinyDB is a lightweight document oriented database optimized for your happiness :)
https://tinydb.readthedocs.org
MIT License
6.84k stars 550 forks source link

bug: MemoryStorage incorrectly keeps references to nested dicts #551

Open helgridly opened 11 months ago

helgridly commented 11 months ago

See below:

from tinydb import TinyDB, Query
from tinydb.storages import MemoryStorage

db = TinyDB(storage=MemoryStorage)
t = db.table("test", cache_size=0)
obj = {"a": "a", "b" : "b", "nested": {"c": "c"}}
t.insert(obj)
print("inserted", obj)

obj['nested']['c'] = "X"
obj2 = t.get(Query().a == "a")
print("retrieved", obj2)
inserted {'a': 'a', 'b': 'b', 'nested': {'c': 'c'}}
retrieved {'a': 'a', 'b': 'b', 'nested': {'c': 'X'}}

I'm guessing somewhere the nested dict is being saved as a reference, not as a copy. I've confirmed it's not the query cache by setting it to zero.

This doesn't happen with normal JSON storage.

msiemens commented 1 month ago

Hey @helgridly, you're absolutely right that TinyDB's MemoryStorage stores a reference, not a copy of the data that you insert. The correct solution to this would be to always create a deep copy of all data that is inserted to the database, but I'm somewhat conflicted as it also technically is a performance penalty for every write operation. Would it help to describe this behavior in the documentation, or do you think should be fixed on a deeper level?