Open daauerbach opened 1 day ago
Update - after much hacking around I noticed that my version of read_array_metadata
(reinstalled this morning from BiocManager but ???) was missing the .parse_datatype
line. With a remotes::install_github(repo = "grimbough/Rarr")
and (various dependencies built from source), I now see that line, but it didn't resolve things.
I did finally manage to reach a correctly read vector of values for this example array, and (I think) for the encompassing problem of the streamflow data that I'm truly trying reach (https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/zarr/chrtout.zarr/streamflow/) despite that having "Data Type: int32". But I'd be happy to avoid my kludgy fix.
Moving quickly and not elegantly or generically, I ended up needing to write "my_" versions of read_data
, .extract_elements
, 'read_chunkand ultimately even
.format_chunk`. I'm not sure how broadly relevant my fixes were, so no PR, but hopefully this helps if you decide any of this warrants package revisions.
Stepping through read_zarr_array
, to get to numbers coming back, I:
metadata$datatype$base_type <- "float"
after declaring metadata
; this fixes the "buffer size too small" from "<i8"my_read_data
at the res <- read_data
call to overwrite the FUN = .extract_elements,
in the chunk_selections <- lapply
declaration with my_extract_elements
which other than some namespace additions was to allow a my_read_chunk
in the chunk <- Rarr:::read_chunk
declarationmy_read_chunk
I needed
chunk_file <- paste0(zarr_array_path,"/", chunk_id)
to fix what was otherwise a wrongly constructed s3_path ("feature_id0") andmy_format_chunk
at the converted_chunk
declaration to deal with values like "4.990063e-322" that I assume are somehow being generated in the .Call
, possibly with some sprintf
thrown in? Anyway, my not at all good solution was just to overwrite the output_type <- switch
declaration to make float = 1L
HTH?
first, thank you for this package @grimbough!
[EDIT - see below update. Some of my guessing here is off a little, but the issue is still valid]
I'm getting
But I think this is due to NULL getting assigned to the
datatype
which then breaks theswitch
call inget_chunk_size
in the followingbuffersize
declaration. That in turn breaks the decompression.Call("decompress_chunk_ZSTD")
Assuming that's all correct, it looks like
Rarr:::read_array_metadata
and underlying.parse_datatype
are ultimately where things start. Hopefully this reproduces for you:failing as-is
array_path <- "https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/zarr/chrtout.zarr/feature_id"
metadata <- Rarr:::read_array_metadata(array_path)
#decompressor <- metadata$compressor$id # decompressor == "zstd", previously/above 'lz4'
fails: NULL, key is '$dtype', value is '<i8' which breaks
get_chunk_size()
datatype <- metadata$datatype
still wrong, returns string "<i8"
datatype <- metadata$dtype
should be list from
datatype <- Rarr:::.parse_datatype(metadata$dtype)
which allows declarationbuffer_size <- Rarr:::get_chunk_size(datatype, dimensions = metadata$chunks)
I'm working on a hacky short term work around and can add to this issue anything useful, but I can see a few possibilities for package changes and I'm not sure of your preference(s) for how to handle this sort of thing going forward (especially as someone who doesn't use zarr much outside of this application and has no sense of how widespread this is likely to be).