meggart / DiskArrays.jl

Other
75 stars 14 forks source link

Changes to iteration over an `Unchunked` SubDiskArray? #51

Closed rafaqz closed 2 years ago

rafaqz commented 2 years ago

Equality of an array and a view of a DiskArray based Raster has broken with 0.3 when the array is Unchunked:

using Rasters
url = "https://download.osgeo.org/geotiff/samples/gdal_eg/cea.tif"
path = download(url)
A = Raster(path)
view(A, 1, 2:3, 1) == [0x00, 0x6b]

Gives the error:

ERROR: MethodError: Cannot `convert` an object of type Tuple{UnitRange{Int64}} to an object of type Int64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at /opt/julia/share/julia/base/twiceprecision.jl:262
  convert(::Type{T}, ::AbstractChar) where T<:Number at /opt/julia/share/julia/base/char.jl:185
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at /opt/julia/share/julia/base/multidimensional.jl:136
  ...

Stacktrace:
  [1] DiskArrays.RegularChunks(cs::Tuple{UnitRange{Int64}}, offset::Int64, s::Int64)
    @ DiskArrays ~/.julia/packages/DiskArrays/7xJDq/src/chunks.jl:17
  [2] (::DiskArrays.var"#34#37")(s::Int64, cs::Tuple{UnitRange{Int64}}, of::Int64)
    @ DiskArrays ~/.julia/packages/DiskArrays/7xJDq/src/chunks.jl:101
  [3] (::Base.var"#4#5"{DiskArrays.var"#34#37"})(a::Tuple{Int64, Tuple{UnitRange{Int64}}, Int64})
    @ Base ./generator.jl:36
  [4] iterate
    @ ./generator.jl:47 [inlined]
  [5] collect(itr::Base.Generator{Base.Iterators.Zip{Tuple{Tuple{Int64}, DiskArrays.GridChunks{1}, Tuple{Int64}}}, Base.var"#4#5"{DiskArrays.var"#34#37"}})
    @ Base ./array.jl:724
  [6] map
    @ ./abstractarray.jl:2948 [inlined]
  [7] DiskArrays.GridChunks(a::Tuple{Int64}, chunksize::DiskArrays.GridChunks{1}; offset::Tuple{Int64})
    @ DiskArrays ~/.julia/packages/DiskArrays/7xJDq/src/chunks.jl:100
  [8] #GridChunks#28
    @ ~/.julia/packages/DiskArrays/7xJDq/src/chunks.jl:98 [inlined]
  [9] GridChunks
    @ ~/.julia/packages/DiskArrays/7xJDq/src/chunks.jl:98 [inlined]
 [10] eachchunk_view(#unused#::DiskArrays.Unchunked, a::SubArray{UInt8, 1, FileArray{GDALfile, UInt8, 3, Nothing, DiskArrays.GridChunks{3}, DiskArrays.Unchunked}, Tuple{Int64, UnitRange{Int64}, Int64}, false})
    @ DiskArrays ~/.julia/packages/DiskArrays/7xJDq/src/subarrays.jl:35
 [11] eachchunk(a::DiskArrays.SubDiskArray{UInt8, 1})
    @ DiskArrays ~/.julia/packages/DiskArrays/7xJDq/src/subarrays.jl:26
 [12] iterate(a::DiskArrays.SubDiskArray{UInt8, 1})
    @ DiskArrays ~/.julia/packages/DiskArrays/7xJDq/src/iterator.jl:4
 [13] iterate
    @ ~/.julia/dev/DimensionalData/src/array/array.jl:66 [inlined]
 [14] _zip_iterate_some
    @ ./iterators.jl:358 [inlined]
 [15] _zip_iterate_all
    @ ./iterators.jl:350 [inlined]
 [16] iterate
    @ ./iterators.jl:340 [inlined]
 [17] ==(A::Raster{UInt8, 1, Tuple{Y{Projected{Float64, LinRange{Float64, Int64}, ReverseOrdered, Regular{Float64}, Intervals{Start}, Metadata{GDALfile, Dict{Any, Any}}, WellKnownText{GeoFormatTypes.CRS, String}, Nothing, Y{Colon}}}}, Tuple{X{Projected{Float64, LinRange{Float64, Int64}, ForwardOrdered, Regular{Float64}, Intervals{Start}, Metadata{GDALfile, Dict{Any, Any}}, WellKnownText{GeoFormatTypes.CRS, String}, Nothing, X{Colon}}}, Band{Categorical{Int64, UnitRange{Int64}, ForwardOrdered, NoMetadata}}}, DiskArrays.SubDiskArray{UInt8, 1}, Symbol, Metadata{GDALfile, Dict{Symbol, Any}}, Nothing}, B::Vector{UInt8})
    @ Base ./abstractarray.jl:2539
 [18] top-level scope
    @ REPL[125]:1

@meggart if you know what is going on with this stacktrace that would be helpful

rafaqz commented 2 years ago

It seems to be ignoring the Unchunked()? Or there is a bug in eachchunk_view(::Unchunked, ... ?

rafaqz commented 2 years ago

Rasters.jl usually uses Unchunked for small files like this, because it seems to be faster below a certain threshold.

meggart commented 2 years ago

I just created #52 to fix this. By default, when a DiskArray returns "Unchunked", DIskArrays.jl will still treat the array as chunked and estimate a chunk size that will comfortably fit into memory, so for small arrays this will be the entire array.

During transition from 0.2-0.3 I changed the signature of estimate_chunksize, but did not fix this in subarrays.jl, which caused the bug here.