kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
505 stars 41 forks source link

Question about arrays as keys #269

Closed tbaumann closed 8 months ago

tbaumann commented 8 months ago

I need to store entries by domain name in the DB. But sometimes I need to select by subdomain.

If I have foo.bar.example.com and baz.example.com I may sometimes need to select those entries by example.com.

The solution seems obvious. I need to maintain a separate DB with indexes by subdomains for every level.

But there is a sentence in the documentation that made me pause.

If arrays are used as keys, they are ordered by first value in the array, with each subsequent element being a tie-breaker.

Can I somehow use that if I store the domain in reverse as an array?

Quick testing suggests that there is only one key per entry and it only matches on the entire array.

const lmdb = require('lmdb');
let myDB = lmdb.open({
    path: 'test-db',
  dupSort: true,
});

const testdata = [
  'foo.bar.example.com',
  'baz.bar.example.com',
  'www.example.com',
  'gesichtsbesamung.de'
];
testdata.forEach(value => {
  let key = value.split('.').reverse();
  myDB.putSync(key, { domain: value });
  console.log("Insert:", key);
});

console.log("All keys")
myDB.getKeys().forEach((value) => {
  console.log("Key: ", value);
})

myDB.getValues(["com", "example", "bar", "foo"]).forEach(value => {
  console.log("foo.bar.example.com:" , value);
})

myDB.getValues(["com", "example", "bar"]).forEach(value => {
  console.log("Subdomain:" , value);
})

Output

Insert: [ 'com', 'example', 'bar', 'foo' ]
Insert: [ 'com', 'example', 'bar', 'baz' ]
Insert: [ 'com', 'example', 'www' ]
Insert: [ 'de', 'gesichtsbesamung' ]
All keys
Key:  [ 'com', 'example', 'bar', 'baz' ]
Key:  [ 'com', 'example', 'bar', 'foo' ]
Key:  [ 'com', 'example', 'www' ]
Key:  [ 'de', 'gesichtsbesamung' ]
foo.bar.example.com: { domain: 'foo.bar.example.com' }
kriszyp commented 8 months ago

I think what you want to do is probably something like:

const MAX_VALUE = '\uffff'; // can be imported from ordered-binary
myDB.getRange({ start: ["com", "example", "bar"], end: ["com", "example", "bar", MAX_VALUE]}).forEach(entry => {
  console.log("Subdomain:" , entry.value);
})
tbaumann commented 8 months ago

Nice. That's what I needed. Much cleaner than maintaining a separate index. And I assume this still takes advantage of the primary index for performance?

kriszyp commented 8 months ago

I assume this still takes advantage of the primary index for performance?

Yes, for sure. O(log n) access with amortized constant time traversal.

tbaumann commented 8 months ago

Nice, I think I'm beginning to understand what the magic behind that is.

Thanks a lot for the help.

tbaumann commented 8 months ago

const ordered_binary = require('ordered-binary'); const MAX_VALUE = ordered_binary.MAXIMUM_KEY;

That doesn't work. Your '\uffff' above does