mkrd / DictDataBase

A python NoSQL dictionary database, with concurrent access and ACID compliance
MIT License
234 stars 11 forks source link

Option not to sort the keys #47

Open noxqs opened 1 year ago

noxqs commented 1 year ago

First, love your lib, saw it on reddit and have replaced yours with my json config saver. However, for me, the order of the keys is important and in function serialize_data_to_json_bytes in io_unsafe.py you always sort the keys. This hurts. My personal opinion is not to do that there ? If you want to sort your keys then that could/should be done prior. Kinda separate the purpose of serialize the data and ordering the data. Alternatively you could add an option when creating the instance if you want an alternative solution. Just my two cents.

for now I patch it with

def serialize_data_to_json_bytes(data: dict) -> bytes:
    from dictdatabase import config
    if config.use_orjson:
        import orjson
        option = (orjson.OPT_INDENT_2 if config.indent else 0)
        return orjson.dumps(data, option=option)
    else:
        db_dump = json.dumps(data, indent=config.indent, sort_keys=False)
        return db_dump.encode()

def io_write(db_name: str, data: dict):
    data_bytes = serialize_data_to_json_bytes(data)
    io_bytes.write(db_name, data_bytes)

def write(self):
    super(SessionFileFull, self).write()
    io_write(self.db_name, self.data_handle)

SessionFileFull.write = write
mkrd commented 1 year ago

Hi, thank you! The reasoning behind always sorting the keys is that the indexer saves the start and end positions of each key value pair so that reading and writing of single key-value pairs can be done very efficiently. The problem that occurs when not sorting the keys is that the order can change arbitrarily, so the entire index file would become invalid on every write operation and has to be rebuilt each time, which is a pretty expensive operation.

So I'm guessing you have a config file that has key-value pairs that are ordered manually, and when you edit it, you want the key-value pairs to be ordered in the same way again. In that case, this wouldn't work reliably anyways, since Python dicts do not guarantee the order of keys, so it is only luck if the keys-value pairs get serialized in the same order as before.

Or do I misunderstand your use-case?

noxqs commented 1 year ago

Hi, wow you're fast at replying :-) Yea, I need to keep the order of the dictionary keys, it defines the order of columns in an excel file. Actually since python 3.6 dictionary keys maintain their order so I am not too worried about that, esp. since I cython/pyinstall and embed the python version (3.10). So if we use python >= 3.6 then the keys don't have to be sorted and all is good ? I have removed the sort in dictDatabase and it seems to work..

mkrd commented 1 year ago

Oh good to know! Back when I learned python, it didn’t guarantee the order so I assumed that would still be the case. Since this library doesn’t support python versions below 3.8, it should work as you said.

Can you do a PR, it would only require a new config variable "sort_keys", and passing that variable to the json and orjson dump functions. I would do it myself, but I am a bit short on time since I need to finish my masters thesis right now:)

mkrd commented 1 year ago

Also, an update to the docs would be required, but that’s also only a few lines of text