meggart / DiskArrays.jl

Other
72 stars 13 forks source link

eachchunk fall-back for transposed or offset arrays? #130

Open Alexander-Barth opened 10 months ago

Alexander-Barth commented 10 months ago

This is low priority issue for me, but when DiskArrays are used for memory arrays, the it might be useful to adapt the default chunksize for transposed arrays:

julia> DiskArrays.default_chunk_size[] = 1
1

julia> eachchunk(ones(4000,4000))
1×130 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
 (1:4000, 1:31)  (1:4000, 32:62)  (1:4000, 63:93)  …  (1:4000, 4000:4000)

julia> eachchunk(ones(4000,4000)')
1×130 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
 (1:4000, 1:31)  (1:4000, 32:62)  (1:4000, 63:93)  …  (1:4000, 4000:4000)

For a transposed array I would expect that the first chunk size of (1:31, 1:4000). I guess one can use stride to check this.

I am note sure if OffsetArray is supposed to be sported but currently all indices from eachchunk are ones-based

using  DiskArrays, OffsetArray
DiskArrays.default_chunk_size[] = 1
OA = OffsetArray(ones(4000,4000),(-1,-1));
c = eachchunk(OA);
OA[last(c)...]
# ERROR: BoundsError: attempt to access 4000×4000 OffsetArray(::Matrix{Float64}, 0:3999, 0:3999) with eltype Float64 with indices 0:3999×0:3999 at index [1:4000, 4000:4000]

None of these issue affect me currently in any production code, but I thought it might be useful to notify you about it.

I am using DiskArrays v0.3.22

rafaqz commented 10 months ago

Thanks, yes we should test on OffsetArray. Correctness fixes are probably more important at this stage, the cost of wrong stride in Array will be lost in the cost of reading from disk (assuming this mostly happens in a broadcast/zip with a real disk-based array). But yes one day we could correct for that.