Please check the type of change your PR introduces:
[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, renaming)
[ ] Refactoring (no functional changes, no api changes)
[ ] Build related changes
[ ] Documentation content changes
[x] Other (please describe):
What is the current behavior?
The use use of spatial index when joining fire events point dataset to input feature list in ForestChangeDiagnostics was disable because it caused a non-obvious failure in assembling the R-Tree. Unfortunately the exact dataset is lost to time.
What is the new behavior?
However, not using the spatial index here results in unacceptable performance on the benchmark dataset: palm_oil_mills.tsv. The decision here is to flip the switch back on and be prepared to deal with earlier failure directly.
At the moment being able to run the benchmark dataset with expected performance is primary concern 🐬 .
This PR also incidentally uses on-disk caching for the datasets in FCD that are used more than once. These will be cleaned up when the job finishes. This should help the job performance overall.
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
The use use of spatial index when joining fire events point dataset to input feature list in ForestChangeDiagnostics was disable because it caused a non-obvious failure in assembling the R-Tree. Unfortunately the exact dataset is lost to time.
What is the new behavior?
However, not using the spatial index here results in unacceptable performance on the benchmark dataset:
palm_oil_mills.tsv
. The decision here is to flip the switch back on and be prepared to deal with earlier failure directly. At the moment being able to run the benchmark dataset with expected performance is primary concern 🐬 .This PR also incidentally uses on-disk caching for the datasets in FCD that are used more than once. These will be cleaned up when the job finishes. This should help the job performance overall.
Does this introduce a breaking change?