msiemens / tinydb

TinyDB is a lightweight document oriented database optimized for your happiness :)
https://tinydb.readthedocs.org
MIT License
6.84k stars 550 forks source link

Atomic replace (second attempt) #482

Closed richardsheridan closed 1 year ago

richardsheridan commented 2 years ago

Revamp of #468. Fixes #467.

I was in a position to fix this one because during the python-atomicwrites package fiasco I read somewhere that the author basically said people should be using os.replace anyway. Other platform-dependent errors were fixed basically by closing file handles in the right order.

richardsheridan commented 2 years ago

CI problems seem unrelated, see https://github.com/msiemens/tinydb/runs/7544311658?check_suite_focus=true#step:5:231

richardsheridan commented 2 years ago

A peculiar edge case of this implementation is that the storage will happily write to a "closed" handle. This is not triggered in the default case because a read is performed before writing, but it can be triggered by CachingMiddleware:

from tinydb import TinyDB, JSONStorage
from tinydb.middlewares import CachingMiddleware
db=TinyDB("test.json", storage=CachingMiddleware(JSONStorage))
db.insert({"test":1})
db.close()
db.insert({"test":2})
db.storage.flush()
db.storage.close()

I'll add a commit that checks self._handle.closed but finding this makes me wary of other edge cases. Maybe this feature should go in as a new "AtomicJSONStorage" and wait for some deprecation period and major version bump to be swapped in?

VermiIIi0n commented 2 years ago

I don't think this is not triggered in the default case since that read operation is reading from the cache instead of the underlying storage.

How about making CachingMiddleware delete its cache after closing? (Which seems reasonable to me)

So that the next operation will read from the file and raise the exception.

Currently:

class CacheMiddelware(Middleware):
    def read(self):
        if self.cache is None:
            # Empty cache: read from the storage
            self.cache = self.storage.read()

        # Return the cached data
        return self.cache

    def close(self):
        # Flush potentially unwritten data
        self.flush()

        # Let the storage clean up, too
        self.storage.close()

->

    def close(self):
        # Flush potentially unwritten data
        self.flush()
        self.cache = None  # Clear cache

        # Let the storage clean up, too
        self.storage.close()

Another solution:

In my personal TinyDB fork, I've forced all Storage classes to implement a closed property, since it's quite an important property and is often needed.