keller-mark / pizzarr

Slice into Zarr arrays in R 🍕
https://keller-mark.github.io/pizzarr/
MIT License
25 stars 2 forks source link

Basic introspection demo #75

Closed dblodgett-usgs closed 1 month ago

dblodgett-usgs commented 1 month ago

I'd like to contribute this but want to run the idea by you before going all in on documentation. This will also be me exploring what the right / wrong way to use pizzarr is.

User story

As a new pizzarr user, I'd like a demo showing how to introspect a zarr store to discover groups and arrays in the store.

Preferred solution

Methods to get a list of valid groups and arrays should be documented in basic and relatable examples.

Possible alternatives

A print method should also display this information in tree form.

Is this the expected pattern to use for introspection? Should the root group have some sort of summary print method like https://zarr.readthedocs.io/en/stable/api/hierarchy.html#zarr.hierarchy.Group.tree ?

library(pizzarr)

root <- system.file("extdata", "fixtures", "v2", "data.zarr", package="pizzarr")

(g <- zarr_open_group(root))
#> <ZarrGroup>
#>   Public:
#>     clone: function (deep = FALSE) 
#>     contains_item: function (item) 
#>     create_dataset: function (name, data = NA, ...) 
#>     create_group: function (name, overwrite = FALSE) 
#>     get_attrs: function () 
#>     get_chunk_store: function () 
#>     get_item: function (item) 
#>     get_meta: function () 
#>     get_name: function () 
#>     get_path: function () 
#>     get_read_only: function () 
#>     get_store: function () 
#>     get_synchronizer: function () 
#>     initialize: function (store, path = NA, read_only = FALSE, chunk_store = NA, 
#>   Private:
#>     attrs: Attributes, R6
#>     cache_attrs: NULL
#>     chunk_store: NA
#>     create_dataset_nosync: function (name, data = NA, ...) 
#>     create_group_nosync: function (name, overwrite = FALSE) 
#>     item_path: function (item) 
#>     key_prefix: 
#>     meta: list
#>     path: 
#>     read_only: FALSE
#>     store: DirectoryStore, Store, R6
#>     synchronizer: NULL

# find root group arrays
(items <- g$get_chunk_store()$listdir())
#>  [1] ".zgroup"                       "1d.chunked.i2"                
#>  [3] "1d.chunked.ragged.i2"          "1d.contiguous.b1"             
#>  [5] "1d.contiguous.blosc.i2"        "1d.contiguous.blosc.vlen-utf8"
#>  [7] "1d.contiguous.f4.be"           "1d.contiguous.f4.le"          
#>  [9] "1d.contiguous.f8"              "1d.contiguous.i4"             
#> [11] "1d.contiguous.lz4.i2"          "1d.contiguous.raw.i2"         
#> [13] "1d.contiguous.raw.vlen-utf8"   "1d.contiguous.S7"             
#> [15] "1d.contiguous.u1"              "1d.contiguous.U13.be"         
#> [17] "1d.contiguous.U13.le"          "1d.contiguous.U7"             
#> [19] "1d.contiguous.zlib.i2"         "1d.contiguous.zstd.i2"        
#> [21] "2d.chunked.blosc.vlen-utf8"    "2d.chunked.i2"                
#> [23] "2d.chunked.ragged.i2"          "2d.chunked.raw.vlen-utf8"     
#> [25] "2d.chunked.U7"                 "2d.contiguous.i2"             
#> [27] "3d.chunked.i2"                 "3d.chunked.mixed.i2.C"        
#> [29] "3d.chunked.mixed.i2.F"         "3d.contiguous.i2"

zarray <- g$get_item(items[29])

# get attributes of given array
zarray$get_attrs()$to_list()
#> $test_attribute
#> [1] "this is a test"

zarray$get_attrs()$set_item("test_attribute", "this is a test")

zarray$get_attrs()$to_list()
#> $test_attribute
#> [1] "this is a test"

Created on 2024-05-08 with reprex v2.1.0

keller-mark commented 1 month ago

Yes this approach works, however there are multiple ways to perform some of these steps:

you can listdir on a store itself

store <- DirectoryStore$new(root)
items <- store$listdir()

and you can get the store like this from a group (although the chunk_store could potentially be a different store)

store <- g$get_store()

and the array can alternatively be accessed like

zarray <- zarr_open_array(store = store, path = items[29])

or

zarray <- zarr_open(store = store, path = items[29]) # if unknown whether group or array

For user-friendliness, I think we should work towards #48 and add methods that wrap some of these that conform to R user expectations, for example

names(group) or names(store) could effectively perform listdir on a group or store root.

Building upon listdir, I agree it would also make sense to expose functionality to recursively walk the group hierarchy. Using this we could also print the tree

dblodgett-usgs commented 1 month ago

Right. Cool. I'll start with some addition to the basics vignette and perhaps rough in names methods for group and store.

Meanwhile, I'm starting work on a set of functions that mimic RNetCDF's pattern of "open", "inq[uire]", "get", "put", "delete". Doing it in a separate package for now. Not sure where that will go, but it'll be fair game to bring in here if you like it.