rubisco-sfa / ILAMB-Data

A collection of scripts used to format ILAMB data and community portal to make contributions
9 stars 3 forks source link

Scoring Changes to NSIDC Permafrost Map #47

Closed nocollier closed 1 year ago

nocollier commented 1 year ago

Our current methodology scores the intersecting area, normalized by the reference and model extent separately. To score the permafrost extent in the reference, but not in the model (so-called 'missed' area), we write:

score_missed = intersection / reference
             = intersection / ( intersection + missed )

As missed -> 0 then score_missed -> 1 and as intersection -> 0 then score_missed -> 0 (missed itself would be large in this case avoiding division by 0). We score the model extent that is not in the reference (so-called 'excess') in the same manner, but normalizing by the model area instead.

One problem @dlawrenncar noticed, is that a lot of a model's missed area could be discontinuous permafrost (see the CMIP5v6 comparison). While it is in one sense reasonable to expect a model to predict permafrost when 50-90% of the grid cell is covered (a tricky concept in our models), in another sense a model missing continuous permafrost is a much more serious problem. It would be helpful if we could adapt our scores to reflect this.

Permafrost Extent Venn Diagram

Note: The sliver marked 'not land in the reference' is removed from the extent calculations. This is area that usually is along coastlines where due to the coarser resolution of models ends up in the ocean of the reference data.

nocollier commented 1 year ago

One idea could be to score missed areas of both types separately, following the same concept as before:

score_missed_c = intersection_c / reference_c
score_missed_dc = intersection_dc / reference_dc

Then we would have 3 scores for the permafrost (score_excess, score_missed_c, and score_missed_dc) that we could blend together with a non-uniform weighting, say:

score = ( 4 * score_excess + 3 * score_missed_c + 1 * score_missed_dc` ) / 8

This is just a sample I picked for illustration, but would make the continuous score 3x as influential as the discontinuous, but total missed score equally as influential as the excess. For that matter we could encode the other permafrost types and score in this way.

nocollier commented 1 year ago

Revisiting this would also give us an opportunity to change our method of estimating permafrost extent for something in the literature.

dlawrenncar commented 1 year ago

I like the idea of calculating the scores separately. Presumably you could report the score for each category in a table, but then also provide the synthesized score with the weighting. Seems like you could extend this to even include isolated permafrost (which is 10-50% area, I think). If a model called an isolated permafrost area permafrost, then that should not necessarily be considered as big of a 'problem' as if a model called an area with no permafrost as permafrost.

I also like the idea of different shades of blue and red to indicate which error is a 'bigger' error vs a 'smaller' error.

And, in all of this, we need to remember that the observations are not very direct and that they are largely inferring what the permafrost would be based on various metrics. And, the dataset represents an estimation at one point in time (or average for some set of years), but permafrost is obviously changing. Just things to be cognizant of, not much we can do about it.

nocollier commented 1 year ago

We could also encode "Ice caps and glaciers" from the map into our version of the dataset and then use that to mask out values from models. I think essentially we are saying "The reference data says to not consider these areas". I think it would take care of a long-standing problem we have of some models reporting soil temperatures over Greenland.

nocollier commented 1 year ago

@ckoven Here is the permafrost thread if you get inspired to try writing some of your ideas. I will post progress here too.

We should also check out the dataset Forrest suggested ages ago in #4.

nocollier commented 1 year ago

Here is a representation of what Dave and I were thinking. The darker blue represents "worse" permafrost errors. Note that this also uses the Slater2013 extent definition. I also noticed that in the paper, you all mention only scoring model areas that are >30% land. I could bring this inside the ILAMB code as well to reduce some of the red halo. CESM2

nocollier commented 1 year ago

@dlawrenncar @ckoven @climate-dude The more I work on this the more uncertain I am with what we are presenting. See the attached figure for progress. Note that the colors have changed meaning slightly.

None of this addresses Charlie's ideas of a benchmarking technique that smoothly transitions from low to high resolution models. Still thinking about that. It would be helpful to have a higher resolution model run to play around with.

CESM2

dlawrenncar commented 1 year ago

Thanks Nate,

I guess I would ignore isolated or sporadic for now. Coarse models that predict permafrost where it should be isolated or sporadic should be penalized (more so for isolated than sporadic). Coarse models that do not predict permafrost where it should be isolated or sporadic are 'correct' in a grid cell mean sense. Not sure that it makes any sense to use that model product because it is really mostly to all model results.

I think for now, this is an advance over what we had and that we need to return to this for high resolution, which is really a different thing.

However, here is one idea about how to score that could work for high (including subgrid) and low resolution models. It's a start, but probably needs some more thought.

  1. Continuous, it's easy, any model (low or high res) that gives 90-100% permafrost gets a S = 1. Coarse model can only be 0 or 100%, so it will always be 1. Fine resolution model could get anything between 90 and 100% and that would all be good.

  2. Discontinuous: Any model that gets within 50-90% area would get a S =

  3. Being outside that is scored (PP is model simulated Percent Permafrost)

S = 1 for PP > 90% S = 1 - (PP-90)/100 when PP is higher than 90% S = 1 - (50-PP)/100 when PP is lower than 50% Coarse models can only have 0 or 100%, so these models will be penalized

  1. Sporadic:

S = 1 for PP between 10 and 50% S = 1 - (PP-50)/100 when PP is higher than 50% S = 1 - (10-PP)/100 when PP is lower than 10%

  1. Isolated:

S = 1 for PP between 0 and 10% S = 1 - (PP-10)/100 when PP is higher than 10%

On Mon, Jul 31, 2023 at 12:51 PM nocollier @.***> wrote:

@dlawrenncar https://github.com/dlawrenncar @ckoven https://github.com/ckoven @climate-dude https://github.com/climate-dude The more I work on this the more uncertain I am with what we are presenting. See the attached figure for progress. Note that the colors have changed meaning slightly.

  • Blue areas reflect where both the model and reference indicate permafrost extent. I have added different shades of blue so you can see what the model got that was continuous vs others.
  • White areas reflect where the model has permafrost not in the reference. I have tried to minimize this by masking by what the reference reports as glaciated and where the model is < 30% land (as in Slater2013). This is a scored quantity and (for now) you will see the number in the colorbar.
  • Red areas reflect missed permafrost extent where the shading reflects severity. As with missed, I have scores in the colorbar.
  • While I think that reporting the errors over the different permafrost types is good, it is also a bit messy. For example, the missed isolated score is 0.17. This very poor score reflects that the model failed to capture most of the area identified as isolated. But that is likely a good thing and we have penalized the model, even if lightly weighted in an overall score. It seems unlikely to me that these isolated/sporadic areas should be captured by the relatively coarse resolution ESMs.
  • Maybe we keep this approach but throw out sporadic and isolated entirely. This effectively says that we are only comparing to the models where permafrost is more prevalent than not (>50%).
  • I also took a look at this https://doi.pangaea.de/10.1594/PANGAEA.888600?format=html#download product. It provides mean annual temperatures at permafrost top as well as a probability of permafrost. I haven't found the full details of what they are doing, but it appears to be a model derived from air temperature, snow depth, and land cover type. They build this model and then run while varying some methodological parameters to get a permafrost probability. I am not sure how, but this makes me wonder if a benchmarking approach based on these probabilities is better than the Venn diagram method I have shown so far?

None of this addresses Charlie's ideas of a benchmarking technique that smoothly transitions from low to high resolution models. Still thinking about that. It would be helpful to have a higher resolution model run to play around with.

[image: CESM2] https://user-images.githubusercontent.com/1331463/257316722-24825e5b-bbd1-4d2a-b019-b4724c676a27.png

— Reply to this email directly, view it on GitHub https://github.com/rubisco-sfa/ILAMB-Data/issues/47#issuecomment-1658960318, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVHOBUUCXOXFEGQ777DXS75CTANCNFSM6AAAAAA2JF3ABY . You are receiving this because you were mentioned.Message ID: @.***>

nocollier commented 1 year ago

Thanks for these thoughts, I will continue to think about this. In the meantime I implemented what we discussed.

nocollier commented 1 year ago

https://github.com/rubisco-sfa/ILAMB/pull/77 https://github.com/rubisco-sfa/ILAMB-Data/tree/master/permafrost