kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
481 stars 39 forks source link

UTF-8 code point of zero causes getRange / overwritten key issues #190

Open gittyeric opened 1 year ago

gittyeric commented 1 year ago

First off, this is a freakin fantastic library, great work!

In my project I have a custom radix 255 function that converts the numeric value of zero into "\x0". After wondering why getRange() was suddenly returning arrays of strings rather than just the plain string that was put, I ended up root-causing it to the inclusion of UTF-8 zero values in some of my keys, which I guess forces a fallback to string array keys. Further, there was some straight up transformations of my keys into the stored keys, meaning I could not retrieve the full input set; here's a minimal example:

const testRoot = open('./testdb', {name: 'root'});
const testDB = testRoot.openDB<string, string>({name: 'mydb'});
testDB.putSync('a,\u0000', "hello");
for (const { key, value } of testDB.getRange({})) {
  expect(key).toEqual('a,\u0000'); // key is trimmed to just 'a,' !
}

Note that because get() applies the same 1-way key transformation, I can still get this value out directly, however it would potentially overwrite a 2nd 'a,' key if I were to have added it! I assume this is due to \u0000 being the string terminator character code in C strings, but us naive JS developers wouldn't know that :laughing: . I've worked around this but a comment in README that zero is the only unsupported UTF-8 char (I've tested the rest up to 255) or perhaps a runtime validation would save us noob JS devs a lot of debugging time in the future.

Thanks again for this great library!

kriszyp commented 1 year ago

Thank you for the kind words, and good job catching this! I will add a note to the README, but I am going to look into fixing this and escaping the null chars as well (see if it can be done without a noticeable performance regression).