m2lines / data-gallery

https://m2lines.github.io/data-gallery/
Apache License 2.0
0 stars 1 forks source link

Verification of hpc and leaphub data #25

Open suryadheeshjith opened 5 months ago

suryadheeshjith commented 5 months ago

For the notebook OM4_SST_Bias here,

Need to ensure the data coming from these sources:

ds_ZB = xr.open_mfdataset('/vast/pp2681/ZB2020/ocean_monthly/*', decode_times=False).isel(time=slice(276,636)) # time slice corresponds to 1981-2010
ds_om4 = xr.open_mfdataset('/vast/pp2681/unparameterized/annual/5yr/*').isel(time=slice(23,48))
om4T = ds_om4.thetao.isel(z_l=0).mean('time').compute()
ZBT = ((ds_ZB.average_DT*ds_ZB.thetao.isel(z_l=0)).sum('time') / ds_ZB.average_DT.sum('time')).compute()

matches,

path = "gs://leap-persistent/jbusecke/OM4_m2lines/daily_combined.zarr"
ds = xr.open_dataset(path, engine="zarr", chunks={})
ds_ZB = ds.sel(experiment="ZB2020")
ds_om4 = ds.sel(experiment="unparameterized")
ZBT = ds_ZB["tos"]
om4T = ds_om4["tos"]
suryadheeshjith commented 5 months ago

@jbusecke Looking at the datasets more closely, the timelines of the obs datasets do not match. 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/WOA_1degree_monthly-feedstock/woa18-1deg-monthly.zarr' has 1958 monthly data while '/vast/pp2681/WOA/woa_1981_2010.nc' presumably has data between 1981 and 2010. These two lines of code probably confirms this -

ds_ZB = xr.open_mfdataset('/vast/pp2681/ZB2020/ocean_monthly/*', decode_times=False).isel(time=slice(276,636)) # time slice corresponds to 1981-2010
ds_om4 = xr.open_mfdataset('/vast/pp2681/unparameterized/annual/5yr/*').isel(time=slice(23,48))

Furthermore, The saved data 'gs://leap-persistent/jbusecke/OM4_m2lines/daily_combined.zarr' which contains ZB2020 and the unparameterized results have data only between 2008 and 2012.

jbusecke commented 3 months ago

Ok the real issue here is that we do not know (or maybe somebody does?) where /vast/pp2681/ZB2020/ocean_monthly/* is coming from (this is one of the major motivations to build datasets in a reproducible way!). Does that data have some metadata so we can confirm the version? I suspect this is an earlier version of world ocean atlas, but we need to confirm this before moving on.

Furthermore, The saved data 'gs://leap-persistent/jbusecke/OM4_m2lines/daily_combined.zarr' which contains ZB2020 and the unparameterized results have data only between 2008 and 2012.

This was the data I had available at the time. I think as long as we make sure that we are computing the mean on the same time frame for both we should be ok? Referring to @LaureZanna for final comment on this.

suryadheeshjith commented 3 months ago

I believe they are data coming from different time frames.