EMR serverless has its own peculiar set of images that people are free (after emr 6.9.x) to customize in various ways. I've cribbed the build steps found here to keep things as small as possible and to avoid depending on anaconda/mamba/etc for building things in production. Perhaps it would be handy to expand the scope of the provided images to support serverless workflows? Perhaps there are other amazon linux image types to consider?
A bit of background on the interest here: GeoTrellis RasterSources backed by GDAL bindings now support 3.7.3, and it is increasingly looking like large geospatial workflows backed by spark make sense to run on managed infrastructure. Cluster management is expensive, time consuming, and hard to get right.
EMR serverless has its own peculiar set of images that people are free (after emr 6.9.x) to customize in various ways. I've cribbed the build steps found here to keep things as small as possible and to avoid depending on anaconda/mamba/etc for building things in production. Perhaps it would be handy to expand the scope of the provided images to support serverless workflows? Perhaps there are other amazon linux image types to consider?
A bit of background on the interest here: GeoTrellis
RasterSource
s backed by GDAL bindings now support 3.7.3, and it is increasingly looking like large geospatial workflows backed by spark make sense to run on managed infrastructure. Cluster management is expensive, time consuming, and hard to get right.