hytest-org / hytest

https://hytest-org.github.io/hytest/
22 stars 10 forks source link

Missing/nodata in conus404-hourly-osn U10 variable #395

Closed cgmorton closed 8 months ago

cgmorton commented 8 months ago

I'm noticing numerous chunks of missing/nodata in the U10 variable read from the conus404-hourly-osn zarr store. It seems like it might be an issue just with the zarr store since the missing data seem to match up with the zarr chunks and the same datetimes have complete coverage in the NCAR RDA store (https://thredds.rda.ucar.edu/thredds/catalog/files/g/ds559.0/catalog.html).

Here is a simple gist notebook that hopefully shows the issue: https://gist.github.com/cgmorton/e4a9f6a1121a1d295dcfc73cb280a580

It seems like most of the missing data are between 2000 and 2017, but there are a few dates in the early 1980s that have missing chunks. I have not done a thorough review, but I have not seen any other missing chunks in my quick check of the other variables we are reading (T2, TD2, V10, PSFC, ACSWDNB, PREC_ACC_NN).

Sorry in advance if this is a known issue or if I made some obvious/simple mistake!

Edit: Here is one of the images from the notebook link above showing the missing chunks download

pnorton-usgs commented 8 months ago

I took a look at the conus404-hourly-osn dataset and can confirm that chunks appear to be missing - I'm not sure how many variables this has occurred with. For comparison I also looked at the conus404-hourly-onprem dataset and it does not have those missing chunks so I tend to think the zarr dataset transfer to the OSN pod is incomplete. @amsnyder do I have access to write to the pod?

amsnyder commented 8 months ago

Thanks for checking on this @pnorton-usgs. I just started the copy of the CONUS404 data over to OSN.

@cgmorton - thank you for raising this. I will let you know when the data has been updated.

rsignell-usgs commented 8 months ago

Yikes! This time, let's make sure we use a transfer method that does checksums (or do checksums after!) Or if that's too expensive since we have so many files in these zarr datasets, perhaps ensure the directory sizes are the same or something?

amsnyder commented 8 months ago

Hi @cgmorton - the hourly dataset has been updated in our intake catalog to point to a new copy of the data. Seems to include those missing chunks - can you take a look at let me know how it looks to you?

cgmorton commented 8 months ago

Sorry for the delay but thank you for updating this! It seems like all of the data is there and I'm not seeing any missing chunks.

amsnyder commented 8 months ago

Awesome! Thank you for letting us know about the missing data.