jnwatson / py-lmdb

Universal Python binding for the LMDB 'Lightning' Database
http://lmdb.readthedocs.io/
Other
646 stars 106 forks source link

multiprocessing write issue. #308

Closed eddiecong closed 2 years ago

eddiecong commented 3 years ago

Affected Operating Systems

Affected py-lmdb Version

1.2.1

py-lmdb Installation Method

pip install lmdb

Machine "free -m" output

              total        used        free      shared  buff/cache   available
Mem:       32884364      296828    24153512        1040     8434024    32120860
Swap:             0           0           0

Other important machine info

Ubuntu 18.04

Describe Your Problem

I am trying to use multiprocessing to write into the zarr dataset(LMDBstorage), which is based on lmdb file (reference link: https://github.com/zarr-developers/zarr-python/blob/4f8cb35ecd9a24f402a3a7a02d2efe177abaf5c8/zarr/storage.py#L1836), but encountered errors mostly relevant to the cursor in lmdb, I wonder if the lmdb supports for multiprocessing write and how to avoid the cursor issues.

Errors/exceptions Encountered

Traceback (most recent call last):
Assertion 'mp->mp_pgno != pgno' failed in mdb_page_touch():
Assertion 'IS_LEAF(mp)' failed in mdb_cursor_next():

Describe What You Expected To Happen

I expected multiprocessing to write into the lmdb file.

Describe What Happened Instead

The Python process crashed.

Additional

I use ray for multiprocessing implementation, package version 1.6.0 (not the reason for causing this error I assuming) Dummy scripts inside the attached files.

Flags for running the script: python generate_lmdb.py --output_zarr_dir --storage_type="lmdb".

Thanks so much in advance! generate_lmdb.txt

jnwatson commented 3 years ago

Using open environment handles in a forked process can cause this problem. You need to make sure to open the environment handle after the fork occurs, or use the non-forking multiprocess mechanism.