mojodna / marblecutter-virtual

Virtual catalogs for marblecutter
Other
48 stars 30 forks source link

Unzipped file too large for Lambda #8

Closed ebrelsford closed 5 years ago

ebrelsford commented 5 years ago

Thanks for all of your work on marblecutter, @mojodna, it's exciting stuff!

I'm trying to deploy it using the most recent directions (sam build, package, deploy) but this fails when it gets to creating the Lambda function--the unzipped code is too large (it's around 277MB). Looking at the dependencies in .aws-sam/build/MarblecutterVirtualFunction, they account for around 273MB of that.

Do you have a handy way of reducing the file size to make it fit under the Lambda limit? Or am I doing something incorrect? I'm building this on Ubuntu if that makes a difference.

mojodna commented 5 years ago

👋

When I initially switched to SAM, .aws-sam/build/MarblecutterVirtualFunction was 216MB. Rebuilding just now on my setup, it's 243MB (which I haven't tried to deploy, so I don't know if it's beyond the limit).

Since it uses --use-container, the OS you're building on shouldn't matter, but since you're on Linux, you may be able to strip the binaries directly:

find .aws-sam/build/MarblecutterVirtualFunction -name *.so* -exec strip {} \;

Give that a whirl. If it helps, we'll find a way to bake it into the build process.

To look for other opportunities, grab the staged ZIP from S3 and run this against it to see what the largest contributors to size are:

unzip -v /tmp/packages.zip | awk {'print $3 " " $8'} | sort -rn
ebrelsford commented 5 years ago

Stripping out the binaries helped, I just had to quote *.so*:

find .aws-sam/build/MarblecutterVirtualFunction -name "*.so*" -exec strip {} \;
mojodna commented 5 years ago

What's the size before/after?

ebrelsford commented 5 years ago

.aws-sam went from 273MB to 184MB after running the above command.

I should add that the Lambda is now failing (potentially because of strip):

Unable to import module 'virtual.lambda': libgdal-a5d23585.so.20.5.0: ELF load command address/offset not properly aligned 

Not looking for you to debug my setup, but wanted to let you know!

mojodna commented 5 years ago

Ugh, yeah, that's from strip...

mojodna commented 5 years ago

Misc. manylinux wheels exhibit the same problem as what you ran into with libgdal: https://github.com/pypa/manylinux/issues/119

mojodna commented 5 years ago

It looks like they're already stripped: https://github.com/matthew-brett/multibuild/blob/951b6c64f01853cf2569000bb30ecd01a16bba0b/configure_build.sh#L30

For my build, these are the top 30 files in terms of compressed size:

11397434 rasterio/.libs/libgdal-c9384152.so.20.5.0
8227317 numpy/.libs/libopenblasp-r0-382c8f3a.3.5.dev.so
5112635 numpy/core/_multiarray_umath.cpython-36m-x86_64-linux-gnu.so
1468116 rasterio/.libs/libhdf5-2d94a66d.so.101.1.0
1348027 rasterio/.libs/libcurl-e8503455.so.4.5.0
1134124 numpy/random/mtrand.cpython-36m-x86_64-linux-gnu.so
997182 rasterio/.libs/libgeos-3-cd838e67.6.2.so
510600 docs/rgb.png
483179 rasterio/.libs/libsqlite3-fdd57a2d.so.0.8.6
437384 PIL/.libs/libfreetype-3e240bcb.so.6.16.1
361853 rasterio/.libs/libnetcdf-24686c8b.so.11.0.4
314326 rasterio/_io.cpython-36m-x86_64-linux-gnu.so
294345 docs/greyscale_stretched.png
291446 numpy/.libs/libgfortran-ed201abd.so.3.0.0
267332 docs/greyscale.png
251507 PIL/.libs/libwebp-db943ca4.so.7.0.3
250315 rasterio/.libs/libwebp-8ccd29fd.so.7.0.2
248043 numpy/linalg/_umath_linalg.cpython-36m-x86_64-linux-gnu.so
227728 PIL/_imaging.cpython-36m-x86_64-linux-gnu.so
203193 rasterio/_base.cpython-36m-x86_64-linux-gnu.so
189553 rasterio/_warp.cpython-36m-x86_64-linux-gnu.so
181565 botocore/data/ec2/2016-11-15/service-2.json
179909 numpy/core/_multiarray_tests.cpython-36m-x86_64-linux-gnu.so
174906 rasterio/.libs/libproj-bd876d1a.so.12.0.0
169470 rasterio/.libs/libopenjp2-8f6da918.so.2.3.0
161234 PIL/.libs/libtiff-8267adfe.so.5.4.0
152221 PIL/.libs/liblcms2-a6801db4.so.2.0.8
150687 dateutil/zoneinfo/dateutil-zoneinfo.tar.gz
148849 certifi/cacert.pem
145931 botocore/vendored/requests/cacert.pem

Removing docs/ will help some (~1MB). Seeing if it matches the Lambda runtime install of botocore (and using that instead) will help, but rasterio and numpy are clearly the main problems.

Ensuring that the archive is compressed with -9 (for maximum compression) is one option (but requires fetching and re-uploading the staged package).

@vincentsarago any ideas on reducing rasterio package size (bundling libcurl seems to have boosted package size substantially)? i like the defaults built into the wheels now (http/2, webp, zstd?), but this use-case doesn't need sqlite, FreeType, and maybe GEOS. NetCDF, HDF5, and OpenJPEG are nice, but unnecessary since the goal is to tile only COGs.

At that point, does it make sense to build our own wheels?

vincentsarago commented 5 years ago

👋 @mojodna I'm not sure why it jumped so high because previously curl was also shipped with the wheels. On options I see right now is to switch to numpy build from source instead of wheels https://github.com/mojodna/marblecutter-virtual/blob/master/Dockerfile#L37-L39 and use --no-binary numpy

here https://github.com/RemotePixel/remotepixel-tiler/blob/master/Dockerfile#L17-L29 are also other way to reduce the size of the package.

Also noticing that we have a problem with the current rasterio==1.0.15 wheel which seems to not fetch the curl libs shipped within it ref: https://github.com/sgillies/rasterio-wheels/issues/18#issuecomment-458219011

ebrelsford commented 5 years ago

I've since been able to deploy this to lambda a few times with no issues, so I'm not sure this needs to stay open.

mojodna commented 5 years ago

Sweet!