wri / gfw_forest_loss_geotrellis

Global Tree Cover Loss Analysis using Geotrellis and SPARK
MIT License
10 stars 8 forks source link

ForestChangeDiagnostic uses spatial index #114

Closed echeipesh closed 3 years ago

echeipesh commented 3 years ago

Pull request type

Please check the type of change your PR introduces:

What is the current behavior?

The use use of spatial index when joining fire events point dataset to input feature list in ForestChangeDiagnostics was disable because it caused a non-obvious failure in assembling the R-Tree. Unfortunately the exact dataset is lost to time.

What is the new behavior?

However, not using the spatial index here results in unacceptable performance on the benchmark dataset: palm_oil_mills.tsv. The decision here is to flip the switch back on and be prepared to deal with earlier failure directly. At the moment being able to run the benchmark dataset with expected performance is primary concern 🐬 .

This PR also incidentally uses on-disk caching for the datasets in FCD that are used more than once. These will be cleaned up when the job finishes. This should help the job performance overall.

Does this introduce a breaking change?