hypertidy / ncmeta

Tidy NetCDF metadata
https://hypertidy.github.io/ncmeta/
11 stars 5 forks source link

Compatibility with a zarr back end. #52

Open dblodgett-usgs opened 3 months ago

dblodgett-usgs commented 3 months ago

I've been playing with the new pizzarr package (https://github.com/keller-mark/pizzarr) in a package called rnz (https://github.com/DOI-USGS/rnz) -- So far, rnz implements the read side of the RNetCDF functions.

I am going to play with this in a fork but wanted to run the idea by you @mdsumner. If we were to create ncmeta functions that wrap open.nc, close.nc, file.inq.nc, and att.get.nc, we could call a zarr back end basically seamlessly with what I've worked up in rnz. When this is all up on CRAN, would you be interested in such a set up?

mdsumner commented 3 months ago

For sure! I just learnt enough about Zarr to really get interested in it, I'll have questions... 😀

mdsumner commented 3 months ago

so one question, why not use/require NetCDF itself? I've been wondering about trying this and I finally had a look at my normal system:

nc-config --all | grep z
  --static        -> -lhdf5_hl -lhdf5 -lcrypto -lcurl -lpthread -lsz -lz -ldl -lm
  --has-szlib     -> yes
  --has-nczarr    -> yes

I expect none of the remote store stuff will be handled in NetCDF library like this (?) but wouldn't that be a better pathway for an RNetCDF package? (I expect you've explored this so will just dump my naive questions on you).

I'm actually very keen to learn Zarr at its core, I only just really clicked to how simple it is - and the fact that we have entirely-R implementation in pizarrr (where it's using standard packages to support remote access and compression details afaict), and a C++ one in GDAL makes it very very accessible to me, but I'm still massively confused about how to find Zarr sources (the pangeo forge example links I find are out of date or not accessible to me, etc).

mdsumner commented 3 months ago

Just for example:

f <- system.file("extdata/bcsd.zarr/", package = "rnz", mustWork = TRUE)

nc <- RNetCDF::open.nc(sprintf("file://%s#mode=nczarr,file", f))
print.nc(nc)

(I only just figured this out, reading the docs: https://docs.unidata.ucar.edu/nug/current/nczarr_head.html)

dblodgett-usgs commented 3 months ago

I experimented with the NetCDF-C and GDAL pathways before and came to the conclusion that it's worth having a base-R implementation.

Stuff like "file://%s#mode=nczarr,file" 🤢 and the extra layer of obfuscation for http library basics stand out, but just on principle, I want to make sure we don't fully rely on non-R logic for this kind of stuff.

I have also wanted to more fully grasp the fundamentals of zarr... rnz and some contributions to pizzarr are my way to get there. A side benefit has been the opportunity to dabble in R6! (https://github.com/keller-mark/pizzarr/pull/82/files is a recent PR I made to pizzarr)

Fot http zarrs, it's so frustrating. There are a few here: https://github.com/keller-mark/pizzarr/blob/main/tests/testthat/test-http-store.R and we have an open storage network pod going now (https://water.usgs.gov/catalog/usecases/8df9f64f-0f38-4849-9c6f-3d931fd2b2ba/) which will grow in its holdings in the next while.

Note that test zarrs can be hosted via plain http (rawgit etc.) for basic testing.

Anyways -- happy to work up a PR for a branch with some of these ideas in if you are interested, I think we are actually pretty close to zarr data "just working" with what I've got via pizzarr already. There are a lot of ""#TODO" lines in the pizzarr repo yet though, so probably not production ready for a while.

dblodgett-usgs commented 3 months ago

f <- "https://raw.githubusercontent.com/DOI-USGS/rnz/main/inst/extdata/bcsd.zarr/"

nc <- RNetCDF::open.nc(sprintf("%s#mode=nczarr,s3", f)) # doesn't work!

rnz::zdump(f) # works!
mdsumner commented 3 months ago

Excellent, totally appreciate your reply and all these details 👍

Fwiw I don't have strong ideas about "should", but it feels weird to extend ncmeta to an R implementation of a store that's "like netcdf", but I must admit my ideas about where things "belong" has changed radically over the years and I'm still coming to terms with how python has changed the landscape 🙏

mdsumner commented 3 months ago

Also, I didn't really understand the prospect for an R Zarr until yesterday when I finally really saw how it's structured, and I absolutely love it. (I wish we could apply smart geotransforms haha but let's see how it goes)