locationtech / geotrellis

GeoTrellis is a geographic data processing engine for high performance applications.
http://geotrellis.io
Other
1.32k stars 360 forks source link

Regrid: force crop to avoid going out of memory #3517

Closed jdries closed 10 months ago

jdries commented 11 months ago

force cropping when regrid leads to much smaller tiles, to avoid memory multiplication in serialized rdd

Overview

Issue: https://github.com/Open-EO/openeo-geotrellis-extensions/issues/191 Regrid uses cropping to create new tiles, but by default, cropping retains the original ArrayTile. So when regrid is used to go from large tile sizes (e.g. 1024 pixels) to smaller ones (e.g. 64 pixels) we get many cropped tiles all referencing these larger original tiles. When Spark then serializes these cropped tiles, the original arrays are all separately serialized, and we get a multiplication effect on the RDD size. Deserializing then can make executors go out of memory.

Checklist

Demo

Optional. Screenshots/REPL

pomadchin commented 11 months ago

Thanks for a nice PR! I'll take a look a bit later!

Could I ask you to sign ECA please to make Eclipse license checker happy?

(Eclipse grabs your name and email from the commit, so doublecheck that you commited under the correct email (that was used to sign ECA), right now the ECA is required for (jns@****.be) email).