kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
484 stars 39 forks source link

getRoot ,getFirst, getLast and getSize function request #77

Closed anuragvohraec closed 2 years ago

anuragvohraec commented 2 years ago

Hi Team,

I am trying to build a scalable DB on top of this solution. For partitioning my data, I would need to break data at middle of the B+Tree, and distribute them to another node in a cluster for runtime sharding. I skimmed over the docs, but was not able to find these functions getRoot, getFirst, getLast and getSize, which could make this job really easy.

vladar commented 2 years ago

I think you can use myDb.getStats() to get the number of entries in the db and some other stats. It is only mentioned in type definitions but it works:

https://github.com/DoctorEvidence/lmdb-store/blob/835a7d52b588feb3a753e1fa8807e72b67142f44/index.d.ts#L213-L216

Also use getRange({ limit: 1 }) and getRange({ limit: 1, reverse: true }) to get first and last items. For lower-level things you can probably use node-lmdb APIs exposed by this library as well via myDb.env and myDb.db

anuragvohraec commented 2 years ago

I think you can use myDb.getStats() to get the number of entries in the db and some other stats. It is only mentioned in type definitions but it works:

https://github.com/DoctorEvidence/lmdb-store/blob/835a7d52b588feb3a753e1fa8807e72b67142f44/index.d.ts#L213-L216

Also use getRange({ limit: 1 }) and getRange({ limit: 1, reverse: true }) to get first and last items. For lower-level things you can probably use node-lmdb APIs exposed by this library as well via myDb.env and myDb.db

I think that would suffice my requiremet. Thank you.

kriszyp commented 2 years ago

And for the mid-point key, I think you could:

let size = db.getStats().entryCount
Array.from(db.getKeys({ offset: Math.round(size / 2), limit: 1 }))

I am not sure if you intend to do this frequently; using an offset will internally require iteration through the tree. It is all done on the C++ side, so pretty fast, but it is conceivable that a function could be added LMDB to find the true root node key of the binary tree. But if this is just an occasional procedure to setup or rebalance a shard, I assume this should be sufficient.

anuragvohraec commented 2 years ago

The solution will be excellent even with offset traversing, however this:

but it is conceivable that a function could be added LMDB to find the true root node key of the binary tree.

wll be cherry on top. It can certainly be helpful, as the internal B+Tree is any way keeping the link to root, so its just exposing that like the First and Last key.

kriszyp commented 2 years ago

Also, FWIW, using offset with half the entry count of database will be a much more accurate way to get the mid-point key than looking to the root node key, since the root node key is only a rough approximation of the mid-point, dependent on how frequently/recently LMDB has rebalanced the various binary tree nodes (I don't know the exact parameters it uses for triggering rebalancing).