Closed hannah-rae closed 8 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
When were the old results computed (which PR)? We can either pull the code in that PR and run it on the current data, or pull the (old?) data from that PR and run the new code to try to isolate the issue.
One thing I noticed recently was this line: https://github.com/nasaharvest/crop-mask/blob/9136824ede97cf31ca72fbb0497324e9dfee5602/src/compare_covermaps.py#L364
Excludes all labels with non-unanimous majority agreement, e.g. "class_probability"
of 0.66 or 0.33. This excludes a significant amount of points for datasets like Togo.
The most recent results were from #329. Might be best to pull the old data and run it using the new code because I don't see any changes in the code that would change the results (and there are more points).
Re: the line you called out, I did unanimous agreement intentionally to make sure we were confident about the results.
I made the following changes to the intercomparison script/results:
harvest-dev
maps to include our maps that are not yet public but are marked DoneI am requesting review/discussion because when I compared the old vs. new results for countries that were unchanged (Togo, Kenya, Malawi, Tanzania, Mali, Rwanda, Uganda, Zambia) I noticed that the number of samples (
crop_support
andnoncrop_support
) were slightly more than the previous version, which changed the results slightly. I am not sure why because nothing changed that should have changed the number of samples found for the old datasets, so maybe there was a change to the datasets on the main branch since the original results were created? Need to look into this more.