meggart / DiskArrays.jl

Other
72 stars 13 forks source link

vec(view(c, :, :, 1)) fails: reshape is restricted to adding singleton dimensions #149

Open lazarusA opened 6 months ago

lazarusA commented 6 months ago

The following fails:

# add DiskArrays#main
# add DimensionalData#main
using Zarr, DiskArrays
using YAXArrays
using DimensionalData

path = "gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710"
g = open_dataset(zopen(path; consolidated=true))

data_cube = DimensionalData.modify(g.tas) do arr
    return DiskArrays.CachedDiskArray(arr)
end
v = view(data_cube, :, :, 1)

vec(v) 
ERROR: For DiskArrays, reshape is restricted to adding singleton dimensions
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] reshape_disk(parent::DiskArrays.SubDiskArray{…}, dims::Tuple{…})
   @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/reshape.jl:46
 [3] _reshape(A::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false}, dims::Tuple{Int64})
   @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/reshape.jl:79
 [4] reshape(parent::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false}, dims::Tuple{Int64})
   @ Base ./reshapedarray.jl:112
 [5] reshape(parent::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false}, dims::Int64)
   @ Base ./reshapedarray.jl:117
 [6] vec(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.CachedDiskArray{…}, Tuple{…}, false})
   @ Base ./abstractarraymath.jl:41
 [7] vec(A::YAXArray{Float32, 2, DiskArrays.SubDiskArray{…}, Tuple{…}})
   @ DimensionalData ~/.julia/packages/DimensionalData/kYR9Z/src/array/array.jl:76
 [8] top-level scope
   @ ~/Documents/NDimensionalViewer/cached_c.jl:15
Some type information was truncated. Use `show(err)` to see complete types.

any work around @rafaqz @meggart ?

rafaqz commented 6 months ago

view(v, :) maybe?

meggart commented 6 months ago

The problem here is that this kind of view is inherently incompatible with eachchunk because the linear indexing simply destroys the natural chunking so there is no way to define eachchunk on this view. @lazarusA do you actually to read the data or do you need a lazy view into the data on disk?

meggart commented 6 months ago

What I meant was if you want to read the data anyway you could do v[:] instead of vec(v)

SimonDanisch commented 6 months ago

Maybe the correct thing would be to overload vec or reshape for SubDiskArray, if there are proper workarounds. I run into this deep inside Makie, where I'm using generic array operations to handle the input arrays, so it would be nice if I could keep doing so :) Just for reference, this is how I run into this:

# This happens also inside a function, that expects to work on any AbstractArray
# view is chosen, so that we can materialize/read the data as late as possible inside Makie
data =  view(data_cube, :, :, 1) 
# The below happens in WGLMakie serialization, since JS doesn't support Matrices.
# `vec` seems to be the ideal choice, since it best preserves the array type and can be allocation free without returning a view
vec(data) 
meggart commented 6 months ago

if there are proper workarounds.

This is the problem, currently there is no sensible workaround. I think the only options would be to read the data (bad, because it might be large data) or to return the Unchunked() trait for these types of views so that at least most indexing should work. I can look into this.

meggart commented 6 months ago

Ok, this seems to work on #146 after defining vec(a::AbstractDiskArray) = view(a,:) I just pushed this to the branch

rafaqz commented 6 months ago

Instead of the full Unchunked(), this is where doing the minimum rechunking that can get sequential indices could help.