msiemens / tinydb

TinyDB is a lightweight document oriented database optimized for your happiness :)
https://tinydb.readthedocs.org
MIT License
6.76k stars 534 forks source link

Database corruption when access_mode="a+" and using collection.remove() #506

Closed oliver-s-lee closed 1 year ago

oliver-s-lee commented 1 year ago

Hi there, nice project!

I think I've come across a bug, which is pretty much described by the title. When a database is opened with access_mode = "a+" and the remove() method of a table is called, the database file is partly duplicated (with the deleted records not appearing in the duplicate). This results in the database being unreadable.

Simple test case:

>>> import tinydb
>>> db = tinydb.TinyDB("test.db", access_mode = "a+")
>>> db.table("test").insert({"test":"doc"})
1
>>> with open("test.db") as file:
...     print(file.read())
... 
{"test": {"1": {"test": "doc"}}}
>>> db.table("test").remove(doc_ids=[1])
[1]
>>> with open("test.db") as file:
...     print(file.read())
... 
{"test": {"1": {"test": "doc"}}}{"test": {}}
>>> db.table("test").all()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/oliver/.local/lib/python3.10/site-packages/tinydb/table.py", line 233, in all
    return list(iter(self))
  File "/home/oliver/.local/lib/python3.10/site-packages/tinydb/table.py", line 636, in __iter__
    for doc_id, doc in self._read_table().items():
  File "/home/oliver/.local/lib/python3.10/site-packages/tinydb/table.py", line 685, in _read_table
    tables = self._storage.read()
  File "/home/oliver/.local/lib/python3.10/site-packages/tinydb/storages.py", line 125, in read
    return json.load(self._handle)
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 33 (char 32)

I couldn't find much discussion of 'access_mode' in the docs, except for the API reference for the JSONStorage which states: access_mode (str) – mode in which the file is opened (r, r+, w, a, x, b, t, +, U), which perhaps indicates that "a+" isn't an allowed option? If so it may be useful to stop with an exception in the constructor to prevent unexpected corruption further down the road.

Cheers!

msiemens commented 1 year ago

Hey @oliver-s-lee,

thanks for reporting this issue! You are correct that using a+ will break TinyDB's JSON storage. For now I've added a warning to the API reference and a Python warning message when using an access mode that will probably lead to data loss or corruption. Throwing an exception however would probably be a breaking change that would necessitate a new major release of TinyDB so I'll that would have to come at some later point 🙂