thinkingmachines / geowrangler

🌏 A python package for wrangling geospatial datasets
https://geowrangler.thinkingmachin.es/
MIT License
47 stars 14 forks source link

Kernel crashes on grids generation for large countries at `zoom_level = 18` #235

Closed tm-abby-moreno closed 1 month ago

tm-abby-moreno commented 2 months ago

📓 Context

Below I wrote down a series of assumptions and tests I did to see if it would fix the problem leading me to concluding that morecantile seems to be the culprit in the grids generation issues we've been facing.

Reiterating that the initial goal was to generate bingtile zoom level 18 grids for Indonesia to match the resolution of the other data marts.

🧪 Assumptions and tests

1. Indonesia is very large for the usual grid generation workflow.

2. Since even at the lowest level the kernel dies, perhaps the geometries of the regions are too complicated adding to the complexity of generating grids.

3. The jump in sizes from a small adm2 area might be causing out of memory issues since there are areas that could be too large to process.

4. Overhead / some memory retention is happening within geowrangler code because simplification and chunking at varied sizes does not solve the problem.

5. Kyle’s fix for the PH unfiltered grids for generate_grid function works for the whole PH at adm0.

7. Tile retrieval is causing an additional processing overhead due to list comprehensions

8. Inspect Morecantile

joshuacortez commented 1 month ago

@tm-abby-moreno FastBingTileGenerator is now merged to master! 🔥 See sample usage here.

Thanks so much for your preliminary investigation that uncovered the bottlenecks. It cascaded into the development of the faster grinding approach 😄