json.JSONDecoder causes Environment.begin to throw ReadersFullError

automorphis commented 1 year ago

Affected Operating Systems

Ubuntu 22.04, run through WSL on Windows 10
Windows 10

Affected py-lmdb Version

1.3.0

py-lmdb Installation Method

sudo pip install lmdb

Using bundled or distribution-provided LMDB library?

Bundled

Distribution name and LMDB library version

0.9.29

Machine "free -m" output

               total        used        free      shared  buff/cache   available                                        
Mem:           12456          88       12287           0          81       12180                                        
Swap:           4096           0        4096

Describe Your Problem

It took me quite some time to localize this problem and write a minimal reproducible example. The json.JSONDecoder class in the Python standard library doesn't work very well with LMDB, although I cannot understand why.

The json.JSONDecoder class is a little idiosyncratic, in that you change the default decoding function by passing your own function to JSONDecoder.__init__ via the parameter object_hook. Frequently you need object_hook to call the instance-method JSONDecoder.decode, therefore it makes sense for object_hook to be an instance-method itself.

The trouble is, if you have object_hook point to an instance-method, then calling Environment.begin enough times will inexplicably eventually raise ReadersFullError. If object_hook points to a function (not an instance-method), then no such error is raised.

Example:

import json, lmdb, pathlib

class BadDecoder(json.JSONDecoder):

    def __init__(self, txn):
        self.txn = txn
        super().__init__(object_hook = self.obj_hook1)

    def obj_hook1(self):
        pass

class GoodDecoder(json.JSONDecoder):

    def __init__(self, txn):
        self.txn = txn
        super().__init__(object_hook=obj_hook2)

def obj_hook2():
    pass

if __name__ == "__main__":

    num_queries = 100000 # big enough, usually crashes before this number
    db_path = pathlib.Path.home() / "pylmdb_json_mre"
    db_path.mkdir(exist_ok=True)
    db = lmdb.open(str(db_path))
    i = -1

    try:
        for i in range(num_queries):
            with db.begin() as txn:
                decoder = GoodDecoder(txn)
    except lmdb.ReadersFullError:
        print(i)
        raise

    i = -1

    try:
        for i in range(num_queries):
            with db.begin() as txn:
                decoder = BadDecoder(txn)
    except lmdb.ReadersFullError:
        print(i)
        raise

Errors/exceptions Encountered

1597                                                                                                                    
Traceback (most recent call last):                                                                                        
File "/mnt/c/Users/mlane/OneDrive/PycharmProjects/cornifer/scripts/bug_report.py", line 44, in <module>                   
with db.begin() as txn:                                                                                             
lmdb.ReadersFullError: mdb_txn_begin: MDB_READERS_FULL: Environment maxreaders limit reached

jnwatson commented 1 year ago

This is an object lifetime problem that is exposing a subtle pylmdb bug. Keeping a reference to txn in BadDecoder causes a delayed finalization of txn.

As a temporary workaround, I found that adding decoder.txn = None after decoder = BadDecoder(txn) fixes it.

The performant solution is to create the txn context outside the loop.

I will investigate further.

automorphis commented 1 year ago

I've encountered ReadersFullError in a different context from the one I posted above. The error occurs if I read from and write to a single Environment using more than one process. I didn't even try to isolate the error, but I did manage a workaround, which is basically "turn it off and on again".

If Environment.begin() throws ReadersFullError, then I do a "soft reset" in the process where the error occurred: I call Environment.close() followed immediately by a call to lmdb.open(). In my test cases, doing soft resets worked if I ran on only two processes. (A soft reset wasn't needed at all for a single process.)

A soft reset could fail in one of two ways: Either lmdb.open() throws ReadersFullError or the first call to Environment.begin() does. If a soft reset fails in any process, I do a "hard reset":

Every (alive) process calls Environment.close(),
Every process waits at a multiprocessing.Event,
Once every alive process is waiting, a single process deletes the lockfile (lock.mdb),
The same process calls lmdb.open (which recreates the lockfile),
The remaining processes call lmdb.open.

jnwatson / py-lmdb