Open rsignell-usgs opened 1 year ago
5 significant digits is like 17 mantissa bits
julia> log2(10)*5
16.609640474436812
So you only cut off a few tailing bits, which is probably also good for bitwise information analysis!
So there is a nice mix of already-truncated vars and non-truncated vars. Ideal for testing robust bitinfo workflows! :)
@observingClouds ☝🏼
Rich, we'll likely go ahead with this dataset in one of our projects. Could you post some links here on more general documentation, like variable names etc? Also what do we have to do to get someone access to the data?
Just tagging @Ishaanj18 here as he will likely work with this dataset.
@milankl , @observingClouds and @Ishaanj18, one of the nice things about this dataset is that it's on the Open Storage Network, which means it's available without credentials (and without egress fees!). Use this notebook as an example of how to access: https://github.com/hytest-org/hytest/blob/main/dataset_access/conus404_explore.ipynb
@rsignell-usgs is it possible that in the last months the variable RH2
got dropped from the conus404-hourly-osn
dataset?
hytest_cat = intake.open_catalog(
"https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml"
)
cat = hytest_cat["conus404-catalog"]
dataset = "conus404-hourly-osn"
@observingClouds - yes, some of the last months of data were missing. I am currently transferring the rest of the data over to the OSN pod now, as the processing is finished. The data will go through 10-01-2022 when it is finished. Connectivity to the OSN CONUS404 data might be spotty while the transfer is happening, but you can read the s3 copy of the data, which is not being updated just yet (I will start that copy once the OSN data is up to date).
Thanks for your prompt response @amsnyder. Just to clarify, I am looking for any data of RH2
not only the latest. A year ago or so RH2
was still a variable of the dataset but does not appear to be any longer. Is that possible?
Ah, yes, we removed that variable because it was not actually part of the CONUS404 model output (noted in our new-ish changelog)- it was a variable derived from the model output. Since it was not part of the data release, we could not make it available in the zarr store.
I can point you to the code that was used to calculate it if you would like to try to re-compute it yourself. I see that it was calculated here, so it looks like it used the rh_teten formula here. There are other ways to calculate RH though (you will see another on in the rh
function above that), but I can't speak to which method is best to use.
Okay, cool! Thanks for the pointers. We will just use a different variable for our test case.
@milankl will be working with two students this summer on xbitinfo. I was offering up the CONUS404 rechunked dataset on OSN as a test case (providing this notebook as an example of how to access) and Milan asked me if some of the CONUS404 variables had already been compressed. This is the list of variables I got from @pnorton-usgs showing how NCAR compressed them with NCO before we received them:
I just raised this issue because I wasn't sure where we wanted to document this...