Closed nialov closed 3 years ago
I tested with pyproj 3.2 against PROJ 8.0.1 and PROJ 8.1.1 and the issue only appears with PROJ 8.1.1.
This change is likely the reason: https://github.com/OSGeo/PROJ/pull/2738
ok, I've given a try at https://github.com/nialov/pyproj-multiprocessing-bug-hunt/blob/master/script_parallel.py and my findings are interesting:
Doing strace shows different system call patterns. When no error is reproduced, sqlite3 uses pread64() which is fork() friendly, whereas with the binary build, it doesn't. I suspect the sqlite3 in the binary wheel to be built against an old kernel / glibc that doesn't support pread64() and sqlite3 fallbacks to seek()+read(). Probably using a more modern infrastructure for building the binary wheels would solve that
One thing I just realized is that @rouault is correct about this difference ref. Everything works just fine with ThreadPoolExecutor
and fails with the ProcessPoolExecutor
.
I suspect the sqlite3 in the binary wheel to be built against an old kernel / glibc that doesn't support pread64() and sqlite3 fallbacks to seek()+read(). Probably using a more modern infrastructure for building the binary wheels would solve that
pyproj wheels currently support manylinux2010 and that comes with these limitations: https://www.python.org/dev/peps/pep-0571/#the-manylinux2010-policy.
I also tried installing with conda
using the conda-forge
channel and had the same issues. So, the fix would likely need to be applied there.
Do you happen to know what us the minimum version of kernel / glibc
is needed?
According to: https://linux.die.net/man/2/pread64 https://launchpad.net/linux/+milestone/2.1.60
Looks like it came ~2016
another finding is that whether SQLite3 use pread64() depends on which source distribution you use. If you use the sqlite-autoconf-XXXX builds, their configure doesn't include pread64() detection. You have to explicitly pass CFLAGS="-DHAVE_PREAD64 -DHAVE_PWRITE64". Whereas the sqlite-src-XXXXX.zip distribution automatically detects it...
another finding is that whether SQLite3 use pread64() depends on which source distribution you use. If you use the sqlite-autoconf-XXXX builds, their configure doesn't include pread64() detection. You have to explicitly pass CFLAGS="-DHAVE_PREAD64 -DHAVE_PWRITE64". Whereas the sqlite-src-XXXXX.zip distribution automatically detects it...
Sounds like this may also impact the OSX wheels. Not sure about the Windows wheels ...
manylinux_2_24_x86_64
wheels work without issue and are available of pypi,
conda-forge issue should be resolved as well: https://github.com/conda-forge/proj.4-feedstock/issues/112
Thanks @nialov for the report and @rouault for helping to debug & resolve the issue :+1:
This bug seems similar (or exactly the same ?) as #426
Code Sample, a copy-pastable example if possible
I've created a
poetry
environment and Python scripts which reproduce the bug on my system. Due to the parallel nature it might not get reproduced on every system (?).Problem description
pyproj 3.2.0
errors when reading its sqlite file in parallel using Pythonconcurrent.futures.ProcessPoolExecutor
. I assume any method to create parallel processes in Python will recreate this.This bug occurred with
pyproj 3.2.0
and is not present withpyproj 3.1.0
.Error message:
Expected Output
Should work in parallel. Added a sequential example script
script_sequential.py
as sanity check.Environment Information
Installation method
Installed from pypi onto Ubuntu 20.10.