hytest-org / hytest

https://hytest-org.github.io/hytest/
22 stars 10 forks source link

CONUS404 pre-rounded vars #301

Open rsignell-usgs opened 1 year ago

rsignell-usgs commented 1 year ago

@milankl will be working with two students this summer on xbitinfo. I was offering up the CONUS404 rechunked dataset on OSN as a test case (providing this notebook as an example of how to access) and Milan asked me if some of the CONUS404 variables had already been compressed. This is the list of variables I got from @pnorton-usgs showing how NCAR compressed them with NCO before we received them:

      ALBEDO:number_of_significant_digits = 5 ;
      CANICE:number_of_significant_digits = 5 ;
      EMISS:number_of_significant_digits = 5 ;
      GLW:number_of_significant_digits = 5 ;
      HFX:number_of_significant_digits = 5 ;
      LAI:number_of_significant_digits = 3 ;
      LH:number_of_significant_digits = 5 ;
      LWDNB:number_of_significant_digits = 5 ;
      LWDNBC:number_of_significant_digits = 5 ;
      LWDNT:number_of_significant_digits = 5 ;
      LWDNTC:number_of_significant_digits = 5 ;
      LWUPB:number_of_significant_digits = 5 ;
      LWUPBC:number_of_significant_digits = 5 ;
      LWUPT:number_of_significant_digits = 5 ;
      LWUPTC:number_of_significant_digits = 5 ;
      OLR:number_of_significant_digits = 5 ;
      QFX:number_of_significant_digits = 5 ;
      SEAICE:number_of_significant_digits = 3 ;
      SH2O:number_of_significant_digits = 5 ;
      SMOIS:number_of_significant_digits = 5 ;
      SNOWC:number_of_significant_digits = 5 ;
      SNOWH:number_of_significant_digits = 5 ;
      SR:number_of_significant_digits = 5 ;
      SSTSK:number_of_significant_digits = 5 ;
      SWDNB:number_of_significant_digits = 5 ;
      SWDNBC:number_of_significant_digits = 5 ;
      SWDNT:number_of_significant_digits = 5 ;
      SWDNTC:number_of_significant_digits = 5 ;
      SWDOWN:number_of_significant_digits = 5 ;
      SWNORM:number_of_significant_digits = 5 ;
      SWUPB:number_of_significant_digits = 5 ;
      SWUPBC:number_of_significant_digits = 5 ;
      SWUPT:number_of_significant_digits = 5 ;
      SWUPTC:number_of_significant_digits = 5 ;
      TG:number_of_significant_digits = 5 ;
      TSLB:number_of_significant_digits = 5 ;
      TSNO:number_of_significant_digits = 5 ;
      TV:number_of_significant_digits = 5 ;
      UST:number_of_significant_digits = 5 ;
      ZSNSO:number_of_significant_digits = 5 ;
      ZWT:number_of_significant_digits = 5 ;

I just raised this issue because I wasn't sure where we wanted to document this...

milankl commented 1 year ago

5 significant digits is like 17 mantissa bits

julia> log2(10)*5
16.609640474436812

So you only cut off a few tailing bits, which is probably also good for bitwise information analysis!

rsignell-usgs commented 1 year ago

So there is a nice mix of already-truncated vars and non-truncated vars. Ideal for testing robust bitinfo workflows! :)

milankl commented 1 year ago

@observingClouds ☝🏼

milankl commented 1 year ago

Rich, we'll likely go ahead with this dataset in one of our projects. Could you post some links here on more general documentation, like variable names etc? Also what do we have to do to get someone access to the data?

observingClouds commented 1 year ago

Just tagging @Ishaanj18 here as he will likely work with this dataset.

rsignell-usgs commented 1 year ago

@milankl , @observingClouds and @Ishaanj18, one of the nice things about this dataset is that it's on the Open Storage Network, which means it's available without credentials (and without egress fees!). Use this notebook as an example of how to access: https://github.com/hytest-org/hytest/blob/main/dataset_access/conus404_explore.ipynb

observingClouds commented 7 months ago

@rsignell-usgs is it possible that in the last months the variable RH2 got dropped from the conus404-hourly-osn dataset?

hytest_cat = intake.open_catalog(
    "https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml"
)
cat = hytest_cat["conus404-catalog"]
dataset = "conus404-hourly-osn"

ref: https://github.com/observingClouds/xbitinfo/pull/234

amsnyder commented 7 months ago

@observingClouds - yes, some of the last months of data were missing. I am currently transferring the rest of the data over to the OSN pod now, as the processing is finished. The data will go through 10-01-2022 when it is finished. Connectivity to the OSN CONUS404 data might be spotty while the transfer is happening, but you can read the s3 copy of the data, which is not being updated just yet (I will start that copy once the OSN data is up to date).

observingClouds commented 7 months ago

Thanks for your prompt response @amsnyder. Just to clarify, I am looking for any data of RH2 not only the latest. A year ago or so RH2 was still a variable of the dataset but does not appear to be any longer. Is that possible?

amsnyder commented 7 months ago

Ah, yes, we removed that variable because it was not actually part of the CONUS404 model output (noted in our new-ish changelog)- it was a variable derived from the model output. Since it was not part of the data release, we could not make it available in the zarr store.

I can point you to the code that was used to calculate it if you would like to try to re-compute it yourself. I see that it was calculated here, so it looks like it used the rh_teten formula here. There are other ways to calculate RH though (you will see another on in the rh function above that), but I can't speak to which method is best to use.

observingClouds commented 7 months ago

Okay, cool! Thanks for the pointers. We will just use a different variable for our test case.