Though I had previously made tile reads lazy for forest_change_diagnostic, tiles were still be opened even when they were not actually needed for the analysis, because of bucket accesses in checkSources().
I moved the assertions (checks) in checkSources() to the beginning of the fetchWindow() implementations, so getSources() no longer needs to open the associated tiles (and checkSources goes away completely). A tile is now only opened exactly when fetchWindow() is called, so there are no file accesses for a particular dataset tile unless it is actually used in the summary computation.
As an optimization, I moved code around in ForestChangeDiagnosticSummary so that argOTBN and braBiomes tiles are only accessed for pixels in ARG or BRA respectively. prodesLossYear is still accessed all the time, as one extra way of deciding if a pixel is in Brazil. The test output changes are just re-ordering of categories in the output.
I will use this full laziness for an upcoming optimization for gfw_dashboard (where we will sometimes use the raster gadm datasets).
Make tile opens be fully lazy.
Though I had previously made tile reads lazy for forest_change_diagnostic, tiles were still be opened even when they were not actually needed for the analysis, because of bucket accesses in checkSources().
I moved the assertions (checks) in checkSources() to the beginning of the fetchWindow() implementations, so getSources() no longer needs to open the associated tiles (and checkSources goes away completely). A tile is now only opened exactly when fetchWindow() is called, so there are no file accesses for a particular dataset tile unless it is actually used in the summary computation.
As an optimization, I moved code around in ForestChangeDiagnosticSummary so that argOTBN and braBiomes tiles are only accessed for pixels in ARG or BRA respectively. prodesLossYear is still accessed all the time, as one extra way of deciding if a pixel is in Brazil. The test output changes are just re-ordering of categories in the output.
I will use this full laziness for an upcoming optimization for gfw_dashboard (where we will sometimes use the raster gadm datasets).