Closed ivanzvonkov closed 5 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Thanks for looking into this @ivanzvonkov !
[ ] Figure out how the value at overlapping images is calculated when using the mosaic function The mosaic() function uses "last on top" but if they are all the same time period, I'm not sure what is considered last. You could alternatively use mode() or something more transparent.
Rerunning intercomparison:
Rwanda intercomparison update, now digital-earth-africa moves up to first:
Tigray 2020, less points but same order for first three maps:
Tigray 2021 same order for top 3
The mosaic() function uses "last on top" but if they are all the same time period, I'm not sure what is considered last. You could alternatively use mode() or something more transparent.
I asked the GFSAD team about why they have duplicate tiles and they responded with:
Our global cropland extent product was mapped by seven different researchers that used slightly different methods to account for differences in agriculture and satellite imagery available. Since the products are split into 10x10 tiles, there are some tiles that contain multiple products/continents.
For the example tile given, N10E30: when you are interested in Middle East use GFSAD30EUCEARUMECE; when you are interested in Africa use GFSAD30AFCE.
So I think the right way to evaluate these would be to be deliberate about the selection of the layer. Mode is not exactly that and introduces the complexity of dealing with 0.5 values. So for now I am going to keep mosaic(). @hannah-rae
@ivanzvonkov Good to know about the different layers, I think eventually we can update that to choose the layer based on an argument for each dataset (maybe we add a continent attribute to the eval datasets).
The duplicates issue comes up in the following datasets: both GFSAD datasets, Digital Earth Africa, Harvest Maps, ESRI LULC. I have only investigated GFSAD but I suspect it's the same issue for all the other datasets.
During intercomparison points were sampled from each image in the relevant imagecollection within the specified boundary. However several imagecollections have overlapping images causing them to be double sampled. See for example GFSAD has predictions for the same tile within the
Africa 30 m
collection and theEurope, Central Asia, Russia, Middle East 30 m
collection.The current solution is to sample from a mosaic of the imagecollection which avoids the double sampling.
Additional to do: [ ] Figure out how the value at overlapping images is calculated when using the mosaic function [ ] Rerun all intercomparison notebooks