kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
505 stars 41 forks source link

Sync cache between 2 processes/threads #296

Open angrymouse opened 3 months ago

angrymouse commented 3 months ago

Is it possible to sync caches between 2 processes? So that if one of them does put, other will fetch new version (not the one cached in other process's memory).

kriszyp commented 3 months ago

Yes, one of the powerful aspects of using memory maps is that there is only one cache, operating system's disk cache, and so when it is updated, all processes see the update. The one hazard to be careful of, is that lmdb-js does reuse the current read transaction for the current event turn. When a read is performed, a read transaction will be created or reset, and subsequent reads will use the same read transaction and snapshot until an event queued with setImmediate resets it. Normally this is adequate, providing a view of the next data snapshot at the next event turn and keeping a consistent view of data within an event turn. However, if you want to force lmdb-js to reset the read transaction to guarantee it is reading the latest state, you can call db.resetReadTxn().

angrymouse commented 3 months ago

So cache property on db will reuse the same cache even if I did open same db from 2 different processes? Not local per process? Would be good to mention it in doc if so.

kriszyp commented 3 months ago

Sorry, I should have realized that you meant that object cache (which the cache property provides). This object cache (a cache of the actual JS objects) is indeed specific to the process (or V8 context/worker). There actually is a way to do process synchronization with this case, you just set it up with:

open({
  cache: {
    validated: true
  },..

This validated cache will always check to see if the cached version matches the version in the database, and if it does, it will use the in-memory cache, otherwise it will reload from the db. I will get this in docs too...

angrymouse commented 3 months ago

What happens when you do db.put() with cache set to {validated:true}? Does db.get() in this case bring old value from db or new one from cache (but just not flushed to disk)?

DraviaVemal commented 2 months ago

Sorry, I should have realized that you meant that object cache (which the cache property provides). This object cache (a cache of the actual JS objects) is indeed specific to the process (or V8 context/worker). There actually is a way to do process synchronization with this case, you just set it up with:

open({
  cache: {
    validated: true
  },..

This validated cache will always check to see if the cached version matches the version in the database, and if it does, it will use the in-memory cache, otherwise it will reload from the db. I will get this in docs too...

I tried this approch but still the second cluster worker is not picking up the updates from primary thread.

Node Version: v18 package details "lmdb": "^3.0.12", "cluster": "^0.7.7",

kriszyp commented 2 months ago

What happens when you do db.put() with cache set to {validated:true}? Does db.get() in this case bring old value from db or new one from cache (but just not flushed to disk)?

with cache set to {validated:true}, db.get() will read from the latest that has been committed to the database, and reload the data from the database if the transaction id doesn't match object in the cache.

I tried this approch but still the second cluster worker is not picking up the updates from primary thread.

Do you have any steps to reproduce this? I just tested this, and it works properly, picking up the updates from the primary process/therad.