Discussion Points for 2022/07/27 Group Meeting

[x] Any response from Gab about CLASS low net radiation? Should we drop DOLCE and LORA? @dlawrenncar (see email response below)
[x] For the higher dimensional soil moisture product, @ypwong22 is working on adding models. She has some more CMIP6 models, we are working on getting the results uploaded where we can discuss them.
[x] Had an email about what we call the Global.Carbon biomass product. The reference we provide discusses tropical biomass, but this is a global dataset. If you see here though, Global.Carbon and Tropical are different. I thought we had dropped this dataset? We need to restore GEOCARBON and Saatchi's Global dataset now has a reference.
[ ] I have an initial work on an adaptation of Umakant's soil carbon data. He has several layers of data, but two I think are particularly useful to us: cSoilAbove1m and cSoil which is the soil carbon above 3m. Do we compare separately to both? Umakant's estimates are high relative to the other products we have and even correlate badly with respect to the NCSCDV22. How could we check that my coarsening strategy is reasonable?
[ ] See Issue #30 about the GBAF datasets. A while ago we dropped them for the FLUXCOM product. I assumed that this was a name change but am I wrong? I grabbed the neural net product, but there are others generated with other techniques. I am not inclined to encode each of their products.
[ ] I have pushed on our new scoring methodology, score = | 1 - error / bad_biome_error |, clipped to be on [0,1]. I have a comparison of the new vs old scores if the quantile used to define the bad_biome_error is changed. I also produced a CMIP5v6 comparison. The 98th quantile is a defensible choice, but leads to scores which use very little of the [0,1] range. For that reason it seems better to use a lower quantile, but which one and what principle do we use to justify the choice?
[ ] Any updates to our project board? Progress made on any dataset? New suggestions? Anyone want to take one on?
[ ] Last meeting, we had some unresolved discussion about the surface soil moisture from WangMao. In particular, CESM2 and NorESM are quite wet in high latitudes relative to the data product and we wondered if this effect is real, especially given the caution provided in Dirmeyer2016 to combine soil moisture data carefully. Here is the dataset publication Wang2021. @jiafumao

Response from Gab: "In your context, I think the biggest benefit of using something like CLASS (or indeed the pre-closure equivalents) would be the observationally constrained uncertainty estimates that might somehow get included in the analyses in the end. Unlike a spread across competing gridded products, these uncertainty bounds represent an observationally constrained estimate. They reflect the discrepancy between in-situ measurements and the product (where observations do exist) and manage to use those observations to create a spatiotemporally complete uncertainty estimate. In reality it’s a very generous uncertainty estimate, because in addition to accounting for observed vs gridded product mismatches, it also implicitly includes the mismatch that comes from the different spatial scale of in-situ (~1km2) versus gridded data (and so site region heterogeneity plays a role). Best thought of as the expected agreement of an unseen in-situ (think flux tower) measurement within the grid cell of the final product and the grid cell value. Hopefully that makes sense."

And response from Sanaa Hobeichi @.***) who is the person who actually developed the products.

Gab is right. I’ve looked before at the biases of the datasets that were involved in deriving Pre-DAT Rn, i.e. the pre-closure equivalent of CLASS-Rn, and they all have large positive biases against in-situ observations, these are CERES-EBAF, ERAI, MERRA-2 and GLDAS-Noah. Both Pre-DAT and CLASS have inherited some of these biases. The mean bias plot (from the CLASS paper https://journals.ametsoc.org/view/journals/clim/33/5/jcli-d-19-0036.1.xml) shows the distribution of bias across 164 flux tower sites, both positive and negative biases in CLASS are equally distributed across sites (median ~ 0), and the magnitude of positive bias is larger than that of negative bias.

[image: Chart, box and whisker chart Description automatically generated]

I think that the benefit of comparing with DOLCE V2 https://researchdata.edu.au/derived-optimal-linear-dolce-v21/1463675/ DOLCE V3 https://researchdata.edu.au/derived-optimal-linear-dolce-v30/1697055 and LORA is that you get to compare the datasets over a longer time period, i.e. 29 years and 23 years in DOLCE V2/V3 and LORA respectively. My understanding is that CMIP6 models are out of phase, which means that years 2000-2009 in CMIP6 and CLASS are not necessarily equivalent, however, over a longer time period the comparison can be more meaningful.

On Thu, Jun 9, 2022 at 8:39 AM nocollier @.***> wrote:

Collecting discussion points for next meeting:

Any response from Gab about CLASS low net radiation? Should we drop DOLCE and LORA? @dlawrenncar https://github.com/dlawrenncar

Last meeting, we had some unresolved discussion about the surface soil moisture from WangMao. In particular, CESM2 and NorESM are quite wet in high latitudes relative to the data product and we wondered if this effect is real, especially given the caution provided in Dirmeyer2016 https://journals.ametsoc.org/view/journals/hydr/17/4/jhm-d-15-0196_1.xml to combine soil moisture data carefully. Here is the dataset publication Wang2021 https://essd.copernicus.org/articles/13/4385/2021/. @jiafumao https://github.com/jiafumao

For the higher dimensional soil moisture product, @ypwong22 https://github.com/ypwong22 is working on adding models. Any updates?

I have an initial work on an adaptation of Umakant's soil carbon data. He has several layers of data, but two I think are particularly useful to us: cSoilAbove1m https://www.climatemodeling.org/~nate/ILAMB-Test/comparisons/cSoilAbove1m.png and cSoil https://www.climatemodeling.org/~nate/ILAMB-Test/comparisons/cSoil.png which is the soil carbon above 3m. Do we compare separately to both? Umakant's estimates are high relative to the other products we have and even correlate badly with respect to the NCSCDV22. How could we check that my coarsening strategy is reasonable?

— Reply to this email directly, view it on GitHub https://github.com/rubisco-sfa/ILAMB-Data/issues/28, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVCQ6FMIVZDJX57QMULVOH6YLANCNFSM5YKKQLLA . You are receiving this because you were mentioned.Message ID: @.***>

From @mmu2019 Re: biomass:

GEOSCARBON is different from Global.Carbon although both are products based on Saatchi's tropical forest biomass, but both are global products. GEOSCARBON is from Martin Herold in Europe. I attached readme for the original data, but they changed their website linked to the original data that I downloaded. Here is the new link: https://www.wur.nl/en/Research-Results/Chair-groups/Environmental-Sciences/Laboratory-of-Geo-information-Science-and-Remote-Sensing/Research/Integrated-land-monitoring/Forest_Biomass.htm

Global.Carbon is Saatchi new product. I think this dataset has been released to the public. I got this from Saatchi through personal exchange a couple of years ago because it was not published yet. Please contact me if you have any further questions. Thanks.

@mmu2019 found a reference for Saatchi's global dataset, we will work it back in:

https://www.science.org/doi/full/10.1126/sciadv.abe9829

Hi all,

I have contiguous United States forest biomass from Oregon State. This data is annual data with time series of 34 years from 1984 until 2017, 30-meter spatial resolution. I think someone knows more about this data, but I can't be sure if it is worth adding this data in our ILAMB system. We can discuss this in the next meeting. Thanks, Nate, for putting these issues together.

Mingquan

On Wed, Jul 27, 2022 at 10:32 AM nocollier @.***> wrote:

Collecting discussion points for next meeting:

Any response from Gab about CLASS low net radiation? Should we drop DOLCE and LORA? @dlawrenncar https://urldefense.com/v3/__https://github.com/dlawrenncar__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZhbDeEVH$

Last meeting, we had some unresolved discussion about the surface soil moisture from WangMao. In particular, CESM2 and NorESM are quite wet in high latitudes relative to the data product and we wondered if this effect is real, especially given the caution provided in Dirmeyer2016 https://urldefense.com/v3/__https://journals.ametsoc.org/view/journals/hydr/17/4/jhm-d-15-0196_1.xml__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZsf5Sdzf$ to combine soil moisture data carefully. Here is the dataset publication Wang2021 https://urldefense.com/v3/__https://essd.copernicus.org/articles/13/4385/2021/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZnQDpS06$. @jiafumao https://urldefense.com/v3/__https://github.com/jiafumao__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZkTV95Md$

For the higher dimensional soil moisture product, @ypwong22 https://urldefense.com/v3/__https://github.com/ypwong22__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZlP-z-ba$ is working on adding models. She has some more CMIP6 models, we are working on getting the results uploaded where we can discuss them.

I have an initial work on an adaptation of Umakant's soil carbon data. He has several layers of data, but two I think are particularly useful to us: cSoilAbove1m https://urldefense.com/v3/__https://www.climatemodeling.org/*nate/ILAMB-Test/comparisons/cSoilAbove1m.png__;fg!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZsZnQ7lw$ and cSoil https://urldefense.com/v3/__https://www.climatemodeling.org/*nate/ILAMB-Test/comparisons/cSoil.png__;fg!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZiL13Hx0$ which is the soil carbon above 3m. Do we compare separately to both? Umakant's estimates are high relative to the other products we have and even correlate badly with respect to the NCSCDV22. How could we check that my coarsening strategy is reasonable?

See Issue #30 https://urldefense.com/v3/__https://github.com/rubisco-sfa/ILAMB-Data/issues/30__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZqcA1rO2$ about the GBAF datasets. A while ago we dropped them for the FLUXCOM product. I assumed that this was a name change but am I wrong? I grabbed the neural net product, but there are others generated with other techniques. I am not inclined to encode each of their products.

Had an email about what we call the Global.Carbon biomass product. The reference https://urldefense.com/v3/__https://doi.org/10.1073/pnas.1019576108__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZsIvgvoY$ we provide discusses tropical biomass, but this is a global dataset. If you see here https://urldefense.com/v3/__https://www.climatemodeling.org/*nate/ILAMB-Test/comparisons/cVeg.png__;fg!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZgoa711c$ though, Global.Carbon and Tropical are different. I thought we had dropped this dataset? Or perhaps I got mixed up and dropped GeoCarbon instead? @mmu2019 https://urldefense.com/v3/__https://github.com/mmu2019__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZiS6-vBg$ can you help?

I have pushed on our new scoring methodology, score = | 1 - error / bad_biome_error |, clipped to be on [0,1]. I have a comparison https://urldefense.com/v3/__https://www.climatemodeling.org/*nate/score_comparison.html__;fg!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZturYKLm$ of the new vs old scores if the quantile used to define the bad_biome_error is changed. I also produced a CMIP5v6 comparison https://urldefense.com/v3/__https://www.climatemodeling.org/*nate/score_comparison_CMIP.html__;fg!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZouzl0WG$. The 98th quantile is a defensible choice, but leads to scores which use very little of the [0,1] range. For that reason it seems better to use a lower quantile, but which one and what principle do we use to justify the choice?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rubisco-sfa/ILAMB-Data/issues/28__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZmG2PDfE$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMALL5B3I7XM7R7DDOR4AD3VWFXCJANCNFSM5YKKQLLA__;!!CzAuKJ42GuquVTTmVmPViYEvSg!Mng2WV6C88t0jg8K204dUq33vS29Ds68MRXYRQFzkTkFh4U-gmb7gHU7jAqgSHCFCHKgxuAIR0NkZtrqycJC$ . You are receiving this because you were mentioned.Message ID: @.***>

To the soil carbon question, the data description here https://bolin.su.se/data/ncscd/ for the netcdf files says it is to 3m. But they also have a figure shown that is to 1m. So perhaps we have to reach out to get both versions?

NCSCD soil carbon actually has 4 layers, 30cm, 100cm, 200cm and 300cm. The data in our ILAMB system is the total from surface to 300 cm. We can add all these 4 layers in a single file or create 4 individual files for ILAMB.

Mingquan

On Wed, Jul 27, 2022 at 12:18 PM Charlie Koven @.***> wrote:

To the soil carbon question, the data description here https://bolin.su.se/data/ncscd/ https://urldefense.com/v3/__https://bolin.su.se/data/ncscd/__;!!CzAuKJ42GuquVTTmVmPViYEvSg!MQN1xg3G_lJmW4P28h6KvYEVFFDM_T919D7AwzP-tZYdlea6oO9rHQM5zy7JFGeIW3I3KwFDzb-SlVrV8tme$ for the netcdf files says it is to 3m. But they also have a figure shown that is to 1m. So perhaps we have to reach out to get both versions?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rubisco-sfa/ILAMB-Data/issues/28*issuecomment-1197263556__;Iw!!CzAuKJ42GuquVTTmVmPViYEvSg!MQN1xg3G_lJmW4P28h6KvYEVFFDM_T919D7AwzP-tZYdlea6oO9rHQM5zy7JFGeIW3I3KwFDzb-Slc6rlAes$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AMALL5BS3P76ET2AYPJHRMTVWGDPBANCNFSM5YKKQLLA__;!!CzAuKJ42GuquVTTmVmPViYEvSg!MQN1xg3G_lJmW4P28h6KvYEVFFDM_T919D7AwzP-tZYdlea6oO9rHQM5zy7JFGeIW3I3KwFDzb-Slc8XUSlM$ . You are receiving this because you were mentioned.Message ID: @.***>

Updates on soil carbon improvements:

I re-encoded the NCSCD soil carbon data, including both cSoil and cSoilAbove1m. In the plots linked below, this is referred to by just NCSCD and NCSCDV22 is what is in ILAMB now. We do not need all 4 layers as they do not compare to anything in the models. I am taking cSoil to be the carbon in 0-3 [m].
There is some small (not so small?) difference between NCSCD and NCSCDV22 in the cSoilAbove1m. I assume this is because of interpolation differences. In updated versions of ILAMB, regridding was not needed and so we keep the data in the original.
In both cSoil and cSoilAbove1m, the Mishra product has approximately 0 correlation to NCSCD. Umakant had no initial reaction to this but there is a comparison paper under review. This paper seems to suggest to me that there are other data products that we could also include.

cSoil cSoilAbove1m

rubisco-sfa / ILAMB-Data

Discussion Points for 2022/07/27 Group Meeting #28