kriszyp / lmdb-js

Simple, efficient, ultra-fast, scalable data store wrapper for LMDB
Other
481 stars 39 forks source link

Random 'No transaction to renew' errors #201

Closed ppedziwiatr closed 1 year ago

ppedziwiatr commented 1 year ago

Hey,

every now and then I'm getting an error:

"errorType": "Runtime.UnhandledPromiseRejection",
    "errorMessage": "Error: Invalid argument: No transaction to renew",
    "trace": [
        "Runtime.UnhandledPromiseRejection: Error: Invalid argument: No transaction to renew",
        "    at process.<anonymous> (file:///var/runtime/index.mjs:1194:17)",
        "    at process.emit (node:events:513:28)",
        "    at emit (node:internal/process/promises:140:20)",
        "    at processPromiseRejections (node:internal/process/promises:274:27)",
        "    at processTicksAndRejections (node:internal/process/task_queues:97:32)"
    ]

The app is running as a simple AWS lambda function, with db files mapped via AWS EFS.

The error occurs probably during this operation: this.db.getRange({ start:${contractTxId}|${lastPossibleKey}, reverse: true, limit: 1 }).asArray

What might be the cause of this issue?

kriszyp commented 1 year ago

I don't know what would cause this off the top of my head. Do you any idea what that line number (index.mjs:1194:17) corresponds to? (as it doesn't map to any source code I know of, and I assume it is a built file?)

ppedziwiatr commented 1 year ago

I'm afraid it corresponds to the the node.js runtime - this error in general seems to be thrown beyond 'main' event loop (hence the 'UnhandledPromiseRejection') - the whole lambda code is wrapped with try-catch.

Unfortunatelly I'm unable to reproduce this locally...I guess lmdb simply does not work in such env (i.e. aws lambda + aws efs..)

kriszyp commented 1 year ago

Can you log any errors by try/catching this call? lmdb certainly is frequently used on AWS at least, so I would think this should be fixable if I had some idea of the source.

kriszyp commented 1 year ago

My best guess is that maybe there is some flakiness with acquiring locks on the EFS/NFS file system, as this has been previously been suspected from the discussion here: #100 .

ppedziwiatr commented 1 year ago

hmm, you see, the issue here's is that the lambda function looks basically like this

exports.handler = async function (_event, _context) {
  const contractTxId = _event.queryStringParameters.contractTxId;

  try {
    // load data from the lmdb cache - basically:
    /*
    const db = open({
      path: `<EFS_MOUNTED_PATH>`
    });
    const result = this.db.getRange({ start: `${contractTxId}|${lastPossibleKey}`, reverse: true, limit: 1 }).asArray;
    */
  } catch (e) {
    logger.error(`Error while loading contract state`, e);
    return responder.internalServerError(e);
  }
};

Also, we've already configured the lambda that updates the cache to have 'concurrency' set to '1'.

I guess I'll need to change the architecture, I was hoping that it would work with EFS.. :|

ppedziwiatr commented 1 year ago

tbh, IMHO it would be good to mark somewhere in the README that the lmdb might not work that well in a serverless / nfs envrionment - I personally was checking the readme and the issues, but I forgot to check the 'discussions'....

kriszyp commented 1 year ago

I am not sure I follow the lambda function, it looks like all the lmdb function calls are commented out? And are you saying that it is actually open that fails rather than getRange? If there is retry logic that needs to be added to the open sequence, that certainly is possible.

ppedziwiatr commented 1 year ago

the commented code is just the "description" of what the original lambda is actually doing (I cannot paste here the real code).