rasterio / rasterio-wheels

MIT License
11 stars 16 forks source link

Use some form of shared libs #59

Closed dazza-codes closed 3 years ago

dazza-codes commented 4 years ago

See https://github.com/OSGeo/gdal/issues/3060

Exactly what a shared-libs solution could be is not entirely clear, but it would need to provide the opportunity for rasterio/fiona to use them in CI/CD systems. Ideally, it might provide something like a pip optional extra pattern for installation of rasterio version X along with binary manylinux wheels for gdal version A, B or C (i.e. a complete matrix that would be consistent with CI/CD matrix builds).

e.g. duplicate binary libs when rasterio is installed along with fiona, pyproj, shapely and other sci-libs:

dazza-codes commented 4 years ago

To preface this comment - the purpose of this packaging hack is to build an AWS lambda layer and minimize the size of the layer to fit as many sci-libs into it as possible. The hack below will break the python venv site-packages for any normal use and should not be used for common venv purposes - it's only used here for illustrative purposes.

It's a nasty hack, but after the package installations are complete, a project test suite passes when using the results of this bash function:

hack_shared_libs () {
  site=$1
  mkdir -p "${site}/shared_libs"
  if [ -d "${site}/rasterio" ]; then

    export GDAL_DATA="${site}/shared_libs/gdal_data"
    export PROJ_DATA="${site}/shared_libs/proj_data"

    mv "${site}"/rasterio/gdal_data "${site}/shared_libs/"
    mv "${site}"/rasterio/proj_data "${site}/shared_libs/"

    for d in $(find "${site}" -type d -name 'gdal_data'); do
      if [ "$d" != "$GDAL_DATA" ]; then
        rsync -auq "$d"/ "$GDAL_DATA"/
        rm -rf "$d"
        #ln -s "$GDAL_DATA" "$d"
      fi
    done

    for d in $(find "${site}" -type d -name 'proj_data'); do
      if [ "$d" != "$PROJ_DATA" ]; then
        rsync -auq "$d"/ "$PROJ_DATA"/
        rm -rf "$d"
        #ln -s "$PROJ_DATA" "$d"
      fi
    done

    export SHARED_LIBS="${site}/shared_libs/libs"
    mkdir -p "${SHARED_LIBS}"

    rsync -auq "$site"/rasterio.libs/ "$SHARED_LIBS"/
    rm -rf "$site"/rasterio.libs
    ln -s "$SHARED_LIBS" "$site"/rasterio.libs

    rsync -auq "$site"/Fiona.libs/ "$SHARED_LIBS"/
    rm -rf "$site"/Fiona.libs
    ln -s "$SHARED_LIBS" "$site"/Fiona.libs

  fi
}

It still contains duplicate libs because the libs are almost the same but they have different file names:

$ ls -l /tmp/tmp_venv_3nAFAw/lib/python3.6/site-packages/shared_libs/
total 12
drwxr-xr-x 2 joe joe 4096 Oct 30 18:53 gdal_data
drwxr-xr-x 2 joe joe 4096 Oct 30 18:53 libs
drwxr-xr-x 2 joe joe 4096 Oct 30 18:53 proj_data

$ ls -1 /tmp/tmp_venv_3nAFAw/lib/python3.6/site-packages/shared_libs/libs/
total 69660
-rwxr-xr-x 1 joe joe    35656 Oct 30 18:37 libaec-f0d4887b.so.0.0.10
-rwxr-xr-x 1 joe joe  3532904 Oct 30 18:37 libcurl-ea538880.so.4.4.0
-rwxr-xr-x 1 joe joe  3532912 Oct 30 18:37 libcurl-fiona-ea538880.so.4.4.0
-rwxr-xr-x 1 joe joe   222320 Oct 30 18:37 libexpat-09c47d4c.so.1.6.8
-rwxr-xr-x 1 joe joe   172944 Oct 30 18:37 libexpat-fiona-c4a93fc7.so.1.6.8
-rwxr-xr-x 1 joe joe 23787528 Oct 30 18:37 libgdal-044c25e5.so.20.5.4
-rwxr-xr-x 1 joe joe 21884960 Oct 30 18:37 libgdal-fiona-9fe15c06.so.20.5.4
-rwxr-xr-x 1 joe joe   323632 Oct 30 18:37 libgeos_c-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe   323640 Oct 30 18:37 libgeos_c-fiona-a68605fd.so.1.13.1
-rwxr-xr-x 1 joe joe  2240704 Oct 30 18:37 libgeos--no-undefined-b94097bf.so
-rwxr-xr-x 1 joe joe  2240712 Oct 30 18:37 libgeos--no-undefined-fiona-b94097bf.so
-rwxr-xr-x 1 joe joe  4236544 Oct 30 18:37 libhdf5-4377e0cf.so.103.1.0
-rwxr-xr-x 1 joe joe   186152 Oct 30 18:37 libhdf5_hl-92c1cdd8.so.100.1.2
-rwxr-xr-x 1 joe joe   342720 Oct 30 18:37 libjpeg-3fe7dfc0.so.9.3.0
-rwxr-xr-x 1 joe joe   342720 Oct 30 18:37 libjpeg-fiona-3fe7dfc0.so.9.3.0
-rwxr-xr-x 1 joe joe    58800 Oct 30 18:37 libjson-c-5f02f62c.so.2.0.2
-rwxr-xr-x 1 joe joe    58808 Oct 30 18:37 libjson-c-fiona-5f02f62c.so.2.0.2
-rwxr-xr-x 1 joe joe  1822440 Oct 30 18:37 libnetcdf-07221d8a.so.13.1.1
-rwxr-xr-x 1 joe joe   205616 Oct 30 18:37 libnghttp2-11cb20b8.so.14.17.1
-rwxr-xr-x 1 joe joe   205624 Oct 30 18:37 libnghttp2-fiona-11cb20b8.so.14.17.1
-rwxr-xr-x 1 joe joe   378776 Oct 30 18:37 libopenjp2-8f6da918.so.2.3.0
-rwxr-xr-x 1 joe joe   281944 Oct 30 18:37 libpng16-898afbbd.so.16.35.0
-rwxr-xr-x 1 joe joe   281952 Oct 30 18:37 libpng16-fiona-898afbbd.so.16.35.0
-rwxr-xr-x 1 joe joe   453488 Oct 30 18:37 libproj-cd06b982.so.12.0.0
-rwxr-xr-x 1 joe joe   453488 Oct 30 18:37 libproj-fiona-cd06b982.so.12.0.0
-rwxr-xr-x 1 joe joe  1421520 Oct 30 18:37 libsqlite3-bc0a2dd7.so.0.8.6
-rwxr-xr-x 1 joe joe  1259400 Oct 30 18:37 libsqlite3-fiona-25a4bc97.so.0.8.6
-rwxr-xr-x 1 joe joe    18760 Oct 30 18:37 libsz-53d02de5.so.2.0.1
-rwxr-xr-x 1 joe joe   783120 Oct 30 18:37 libwebp-fbd93615.so.7.0.5
-rwxr-xr-x 1 joe joe    85656 Oct 30 18:37 libz-a147dcb0.so.1.2.3
-rwxr-xr-x 1 joe joe    85664 Oct 30 18:37 libz-fiona-a147dcb0.so.1.2.3

It might help if the -fiona- were dropped from the lib names (although there may be good reasons for that to actually avoid library version conflicts or something). When fiona is using basically the same version of any library that is also used in rasterio (e.g. libz), this hacked consolidation into a shared-libs might work; e.g. if libz-a147dcb0.so.1.2.3 is binary equivalent to libz-fiona-a147dcb0.so.1.2.3 and they both used the same lib-name, the shared-libs hack should result in just one of these files in the site-packages. This is a nasty hack because it's done after the installation - it would be better if some kind of shared-libs dependency could be used in the CI systems and packaging for any project that requires it so that CI and packaging for rasterio and fiona could be tested against it and rely on it for package distributions.

dazza-codes commented 4 years ago

Some details depend on the linking (absolute vs. relative) for the .so. libs. The shapely libgeos gets broken by the same hack, e.g.

# this one is from shapely
$ ldd /tmp/tmp_venv_Z9YFAp/lib/python3.6/site-packages/shared_libs/libs/libgeos_c-a68605fd.so.1.13.1 
    linux-vdso.so.1 (0x00007ffc90b6b000)
    libgeos--no-undefined-b94097bf.so => not found
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f86577e4000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8657446000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8657055000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8656e3d000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f8657dc3000)

# it is OK while still in the shapely package installation
$ ldd /tmp/tmp_venv_Z9YFAp/lib/python3.6/site-packages/shapely/.libs/libgeos_c-a68605fd.so.1.13.1 
    linux-vdso.so.1 (0x00007ffdeb936000)
    libgeos--no-undefined-b94097bf.so => /tmp/tmp_venv_Z9YFAp/lib/python3.6/site-packages/shapely/.libs/./libgeos--no-undefined-b94097bf.so (0x00007fb2a0abe000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fb2a0735000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fb2a0397000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb29ffa6000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fb29fd8e000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fb2a1138000)

The fiona libgeos seems to move OK to preserve relative links:

$ ldd /tmp/tmp_venv_Z9YFAp/lib/python3.6/site-packages/shared_libs/libs/libgeos_c-fiona-a68605fd.so.1.13.1 
    linux-vdso.so.1 (0x00007ffc633f1000)
    libgeos--no-undefined-fiona-b94097bf.so => /tmp/tmp_venv_Z9YFAp/lib/python3.6/site-packages/shared_libs/libs/./libgeos--no-undefined-fiona-b94097bf.so (0x00007fd23c5f1000)
    libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fd23c268000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd23beca000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd23bad9000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd23b8c1000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fd23cc6b000)

The failure to resolve some symbols can be hacked by setting LD_LIBRARY_PATH, e.g. the following works OK:

$ export LD_LIBRARY_PATH=/tmp/tmp_venv_Z9YFAp/lib/python3.6/site-packages/shared_libs/libs
$ find /tmp/tmp_venv_Z9YFAp/lib/python3.6/site-packages/shared_libs/libs/ -iname "*.so.*" | while read lib_name; do ldd -r "$lib_name" 2>&1; done
sgillies commented 4 years ago

We're not going to do this. I think combining rasterio and fiona is a more practical solution. And I'm not ready to take that on either.