msiemens / tinydb

TinyDB is a lightweight document oriented database optimized for your happiness :)
https://tinydb.readthedocs.org
MIT License
6.84k stars 550 forks source link

doc_id is preemptively converted to string before being passed to storage class #459

Closed cdwilson closed 2 years ago

cdwilson commented 2 years ago

As someone who is brand new to TinyDB, my naive assumption was that when writing to storage, the doc_id would be passed to the storage class keeping the document_id_class type intact, and it would be the responsibility of each storage class to convert it into a type that is compatible for that storage type.

However, I was surprised to find that the Table class preemptively converts the doc_id to a string before passing the table to the storage class for writing:

https://github.com/msiemens/tinydb/blob/967dbf9197b2943eb9bd91650f90bd9ab497bc12/tinydb/table.py#L727-L736

I'm curious if there is a reason why this conversion needs to happen in the Table._update_table() method, or whether the conversion could be delegated to the storage class instead?

For example, the YAMLStorage class described in Write a Custom Storage supports writing a YAML document directly to storage with integer document IDs. Making the following change to the Table._update_table() method enables the YAMLStorage storage class to store a YAML document with integer document IDs:

         # Perform the table update operation
         updater(table)

-        # Convert the document IDs back to strings.
-        # This is required as some storages (most notably the JSON file format)
-        # don't support IDs other than strings.
-        tables[self.name] = {
-            str(doc_id): doc
-            for doc_id, doc in table.items()
-        }
+        tables[self.name] = table

         # Write the newly updated data back to the storage
         self._storage.write(tables)
import yaml
from tinydb import Query, TinyDB
from tinydb.storages import Storage

class YAMLStorage(Storage):
    def __init__(self, filename):
        self.filename = filename

    def read(self):
        with open(self.filename) as handle:
            try:
                data = yaml.safe_load(handle.read())
                return data
            except yaml.YAMLError:
                return None

    def write(self, data):
        with open(self.filename, "w+") as handle:
            yaml.dump(data, handle)

    def close(self):
        pass

db = TinyDB("db.yml", storage=YAMLStorage)
db.insert({"name": "John", "age": 22})
_default:
  1:
    age: 22
    name: John
MrPigss commented 2 years ago

I agree with @cdwilson ,

You can choose the type of the doc_id by setting doc_id_class, but it get converted to string anyway. Why not let you choose the type of doc_id you actually want to write? Also even when working with the json format, the moment you choose to use something other than the standard lib json u probably can use other types as key as well. For example orjson has a nonstringkey option wich allows it to parse things other than string.

Anyhow, this seems more of a job for the storage than for the Table class.

msiemens commented 2 years ago

Short note: I'm moving this to a discussion as I feel that the discussions board is a better place to, well, discuss this topic 🙂