twmht / python-rocksdb

Python bindings for RocksDB
BSD 3-Clause "New" or "Revised" License
274 stars 89 forks source link

multi_get and duplicate keys #82

Open JS-Parent opened 4 years ago

JS-Parent commented 4 years ago

The doc for multi_get mentions that:

keys will not be “de-duplicated”. Duplicate keys will return duplicate values in order. https://python-rocksdb.readthedocs.io/en/latest/api/database.html

But when I use it with duplicated keys I get a dictionary with a single key whose value is not a list with the repeated values:

Example:

import rocksdb

db = rocksdb.DB("/tmp/", rocksdb.Options(create_if_missing=True))
db.put(b'\x00', b'\x00')
d = db.multi_get([b'\x00', b'\x00', b'\x00'])
print(d)
print(type(d[b'\x00']))

outputs:

{b'\x00': b'\x00'}
<class 'bytes'>
iFA88 commented 4 years ago

Hi, the document says for that function: Returns: | A dict where the value is either bytes or None if not found

iFA88 commented 4 years ago

BTW what you mentioned NOTICE is right, you put your keys in a list, then the function will get from the database every item what is on the list, after that they will put into a dict and overwrites it when there are duplicates. If you request one single key 1M times, it will be read from the database 1M times and you got only a dict with one key.

JS-Parent commented 4 years ago

Yes, I understand that the database is queried twice when the list of keys contains the twice the same key. However, given that python dict's cannot store duplicated keys and the documentation states: Duplicate keys will return duplicate values in order. I expected my code snippet to return:

{b'\x00': [b'\x00', b'\x00']}
<class 'list'>

since this seems to be the pythonic way of storing duplicated entries in a dict. Right now it's confusing, yes the duplicated values are returned but python dict removes them so they are de-duplicated while the doc says the opposite. If this is the intended behavior I would just change the doc to make this more explicit.