pyproj4 / pyproj

Python interface to PROJ (cartographic projections and coordinate transformations library)
https://pyproj4.github.io/pyproj
MIT License
1.05k stars 212 forks source link

BUG: sqlite read error with ProcessPoolExecutor #933

Closed nialov closed 3 years ago

nialov commented 3 years ago

This bug seems similar (or exactly the same ?) as #426

Code Sample, a copy-pastable example if possible

I've created a poetry environment and Python scripts which reproduce the bug on my system. Due to the parallel nature it might not get reproduced on every system (?).

git clone https://github.com/nialov/pyproj-multiprocessing-bug-hunt.git
cd pyproj-multiprocessing-bug-hunt
# Need poetry installed on system
poetry install
# Script with parallel processes and which tries to reproduce bug
poetry run python script_parallel.py
# Sanity check script with sequential processing which doesn't error.
poetry run python script_parallel.py

Problem description

pyproj 3.2.0 errors when reading its sqlite file in parallel using Python concurrent.futures.ProcessPoolExecutor. I assume any method to create parallel processes in Python will recreate this.

This bug occurred with pyproj 3.2.0 and is not present with pyproj 3.1.0.

Error message:

➜ pr python script_parallel.py
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/nialov/.cache/pypoetry/virtualenvs/pyproj-multiprocessing-bug-hunt-ovaqiMDF-py3.8/lib/python3.8/site-packages/pyproj/crs/crs.py", line 326, in __init__
    self._local.crs = _CRS(self.srs)
  File "pyproj/_crs.pyx", line 2347, in pyproj._crs._CRS.__init__
pyproj.exceptions.CRSError: Invalid projection: EPSG:3067: (Internal Proj Error: proj_create: SQLite error on SELECT auth_name FROM authority_list: database disk image is malformed)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "script_parallel.py", line 13, in <module>
    print(process.result())
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
pyproj.exceptions.CRSError: Invalid projection: EPSG:3067: (Internal Proj Error: proj_create: SQLite error on SELECT auth_name FROM authority_list: database disk image is malformed)

Expected Output

Should work in parallel. Added a sequential example script script_sequential.py as sanity check.

Environment Information

➜ pr pyproj -v
pyproj info:
    pyproj: 3.2.0
      PROJ: 8.1.1
  data dir: /home/nialov/.cache/pypoetry/virtualenvs/pyproj-multiprocessing-bug-hunt-ovaqiMDF-py3.8/lib/python3.8/site-packages/pyproj/proj_dir/share/proj
user_data_dir: /home/nialov/.local/share/proj

System:
    python: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 10.3.0]
executable: /home/nialov/.cache/pypoetry/virtualenvs/pyproj-multiprocessing-bug-hunt-ovaqiMDF-py3.8/bin/python
   machine: Linux-4.19.84-microsoft-standard-x86_64-with-glibc2.32

Python deps:
   certifi: 2021.05.30
       pip: 21.1.3
setuptools: 57.4.0
    Cython: None

Installation method

Installed from pypi onto Ubuntu 20.10.

snowman2 commented 3 years ago

I tested with pyproj 3.2 against PROJ 8.0.1 and PROJ 8.1.1 and the issue only appears with PROJ 8.1.1.

This change is likely the reason: https://github.com/OSGeo/PROJ/pull/2738

rouault commented 3 years ago

ok, I've given a try at https://github.com/nialov/pyproj-multiprocessing-bug-hunt/blob/master/script_parallel.py and my findings are interesting:

Doing strace shows different system call patterns. When no error is reproduced, sqlite3 uses pread64() which is fork() friendly, whereas with the binary build, it doesn't. I suspect the sqlite3 in the binary wheel to be built against an old kernel / glibc that doesn't support pread64() and sqlite3 fallbacks to seek()+read(). Probably using a more modern infrastructure for building the binary wheels would solve that

snowman2 commented 3 years ago

One thing I just realized is that @rouault is correct about this difference ref. Everything works just fine with ThreadPoolExecutor and fails with the ProcessPoolExecutor.

snowman2 commented 3 years ago

I suspect the sqlite3 in the binary wheel to be built against an old kernel / glibc that doesn't support pread64() and sqlite3 fallbacks to seek()+read(). Probably using a more modern infrastructure for building the binary wheels would solve that

pyproj wheels currently support manylinux2010 and that comes with these limitations: https://www.python.org/dev/peps/pep-0571/#the-manylinux2010-policy.

I also tried installing with conda using the conda-forge channel and had the same issues. So, the fix would likely need to be applied there.

Do you happen to know what us the minimum version of kernel / glibc is needed?

snowman2 commented 3 years ago

According to: https://linux.die.net/man/2/pread64 https://launchpad.net/linux/+milestone/2.1.60

Looks like it came ~2016

snowman2 commented 3 years ago

Will need to look into PEP 600 for wheel building for modern wheels.

rouault commented 3 years ago

another finding is that whether SQLite3 use pread64() depends on which source distribution you use. If you use the sqlite-autoconf-XXXX builds, their configure doesn't include pread64() detection. You have to explicitly pass CFLAGS="-DHAVE_PREAD64 -DHAVE_PWRITE64". Whereas the sqlite-src-XXXXX.zip distribution automatically detects it...

snowman2 commented 3 years ago

another finding is that whether SQLite3 use pread64() depends on which source distribution you use. If you use the sqlite-autoconf-XXXX builds, their configure doesn't include pread64() detection. You have to explicitly pass CFLAGS="-DHAVE_PREAD64 -DHAVE_PWRITE64". Whereas the sqlite-src-XXXXX.zip distribution automatically detects it...

Sounds like this may also impact the OSX wheels. Not sure about the Windows wheels ...

snowman2 commented 3 years ago

manylinux_2_24_x86_64 wheels work without issue and are available of pypi,

snowman2 commented 3 years ago

conda-forge issue should be resolved as well: https://github.com/conda-forge/proj.4-feedstock/issues/112

snowman2 commented 3 years ago

Thanks @nialov for the report and @rouault for helping to debug & resolve the issue :+1: