rafaqz / Rasters.jl

Raster manipulation for the Julia language
MIT License
213 stars 36 forks source link

Base.getindex is Type Unstable for RasterStacks with Many Layers #464

Open JoshuaBillson opened 1 year ago

JoshuaBillson commented 1 year ago

I noticed some performance issues while trying to iterate over a hyperspectral image consisting of 235 bands.

Upon further analysis, it seems that Base.getindex is type unstable for RasterStacks with a large number of layers.

This is an example RasterStack with only 20 bands, which produces no warnings:

julia> @code_warntype desis1[1003];
MethodInstance for getindex(::RasterStack{NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Matrix{Int16}}}, Tuple{X{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, X{Colon}}}, Y{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, Y{Colon}}}}, Tuple{}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Tuple{X{Colon}, Y{Colon}}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}}}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Int16}}}, ::Int64)
  from getindex(s::AbstractDimStack, i::Int64, I::Int64...) @ DimensionalData ~/.julia/packages/DimensionalData/pS9IE/src/stack/indexing.jl:15
Arguments
  #self#::Core.Const(getindex)
  s::RasterStack{NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Matrix{Int16}}}, Tuple{X{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, X{Colon}}}, Y{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, Y{Colon}}}}, Tuple{}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Tuple{X{Colon}, Y{Colon}}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}}}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Int16}}}
  i::Int64
  I::Tuple{}
Locals
  #169::DimensionalData.var"#169#170"{Int64, Tuple{}}
Body::NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Int16}}
1 ─      nothing
│   %2 = DimensionalData.:(var"#169#170")::Core.Const(DimensionalData.var"#169#170")
│   %3 = Core.typeof(i)::Core.Const(Int64)
│   %4 = Core.typeof(I)::Core.Const(Tuple{})
│   %5 = Core.apply_type(%2, %3, %4)::Core.Const(DimensionalData.var"#169#170"{Int64, Tuple{}})
│        (#169 = %new(%5, i, I))
│   %7 = #169::DimensionalData.var"#169#170"{Int64, Tuple{}}
│   %8 = DimensionalData.data(s)::NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Matrix{Int16}}}
│   %9 = DimensionalData.map(%7, %8)::NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20), NTuple{20, Int16}}
└──      return %9

And here is an example RasterStack with 100 bands, which produces a type warning:

julia> @code_warntype desis2[1003];
MethodInstance for getindex(::RasterStack{NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, Matrix{Int16}}}, Tuple{X{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, X{Colon}}}, Y{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, Y{Colon}}}}, Tuple{}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, Tuple{X{Colon}, Y{Colon}}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}}}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, Int16}}}, ::Int64)
  from getindex(s::AbstractDimStack, i::Int64, I::Int64...) @ DimensionalData ~/.julia/packages/DimensionalData/pS9IE/src/stack/indexing.jl:15
Arguments
  #self#::Core.Const(getindex)
  s::RasterStack{NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, Matrix{Int16}}}, Tuple{X{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, X{Colon}}}, Y{Projected{Float64, LinRange{Float64, Int64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Intervals{DimensionalData.Dimensions.LookupArrays.Start}, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}, WellKnownText{GeoFormatTypes.CRS}, Nothing, Y{Colon}}}}, Tuple{}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, Tuple{X{Colon}, Y{Colon}}}}, DimensionalData.Dimensions.LookupArrays.NoMetadata, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, DimensionalData.Dimensions.LookupArrays.Metadata{Rasters.GDALsource, Dict{String, Any}}}}, NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, Int16}}}
  i::Int64
  I::Tuple{}
Locals
  #169::DimensionalData.var"#169#170"{Int64, Tuple{}}
Body::NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100)}
1 ─      nothing
│   %2 = DimensionalData.:(var"#169#170")::Core.Const(DimensionalData.var"#169#170")
│   %3 = Core.typeof(i)::Core.Const(Int64)
│   %4 = Core.typeof(I)::Core.Const(Tuple{})
│   %5 = Core.apply_type(%2, %3, %4)::Core.Const(DimensionalData.var"#169#170"{Int64, Tuple{}})
│        (#169 = %new(%5, i, I))
│   %7 = #169::DimensionalData.var"#169#170"{Int64, Tuple{}}
│   %8 = DimensionalData.data(s)::NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100), NTuple{100, Matrix{Int16}}}
│   %9 = DimensionalData.map(%7, %8)::NamedTuple{(:Band_1, :Band_2, :Band_3, :Band_4, :Band_5, :Band_6, :Band_7, :Band_8, :Band_9, :Band_10, :Band_11, :Band_12, :Band_13, :Band_14, :Band_15, :Band_16, :Band_17, :Band_18, :Band_19, :Band_20, :Band_21, :Band_22, :Band_23, :Band_24, :Band_25, :Band_26, :Band_27, :Band_28, :Band_29, :Band_30, :Band_31, :Band_32, :Band_33, :Band_34, :Band_35, :Band_36, :Band_37, :Band_38, :Band_39, :Band_40, :Band_41, :Band_42, :Band_43, :Band_44, :Band_45, :Band_46, :Band_47, :Band_48, :Band_49, :Band_50, :Band_51, :Band_52, :Band_53, :Band_54, :Band_55, :Band_56, :Band_57, :Band_58, :Band_59, :Band_60, :Band_61, :Band_62, :Band_63, :Band_64, :Band_65, :Band_66, :Band_67, :Band_68, :Band_69, :Band_70, :Band_71, :Band_72, :Band_73, :Band_74, :Band_75, :Band_76, :Band_77, :Band_78, :Band_79, :Band_80, :Band_81, :Band_82, :Band_83, :Band_84, :Band_85, :Band_86, :Band_87, :Band_88, :Band_89, :Band_90, :Band_91, :Band_92, :Band_93, :Band_94, :Band_95, :Band_96, :Band_97, :Band_98, :Band_99, :Band_100)}
└──      return %9

In particular, the issue seems to be caused by the type of %9, which is missing the NTuple{100, Int16} at the end.

The result is that the first example consists of only a single allocation with 48 bytes while the second is 62 allocations with 3.77 KB according to @btime.

rafaqz commented 1 year ago

Can you just use a Raster??? Then you are indexing a single array.

Stacks with that many layers dont make sense, the tuple is too big for the compiler. I am contemplating adding a different kind of stack that uses a vector, but it wont be type stable for mixed layer types, and will still be much slower than a Raster.

You really only need a RasterStack when you have bands of different types, like in a netcdf, or with multuple separate tif files.

JoshuaBillson commented 1 year ago

I cannot use a Raster because I need to extract the spectral signatures and corresponding landcover classes into a DataFrame. That is, the RasterStack contains both remotely sensed imagery and land cover data, which needs to be harmonized for further analysis.

Interestingly, I've found that sinking the RasterStack directly into a DataFrame via the Tables.jl interface seems to solve the problem (it was taking around 30 seconds and 14GB of allocations to iterate over the data manually).

julia> @btime desis.stack |> replace_missing |>  DataFrame |> dropmissing
  721.353 ms (13647 allocations: 3.03 GiB)
1086341×237 DataFrame
     Row │ X         Y          Band_1  Band_2  Band_3  Band_4  Band_5  Band_6  Band_7  Band_8  Band_9  Band_10  Band_11  Band_12  Band_13  Band_14 ⋯
         │ Float64   Float64    Int16   Int16   Int16   Int16   Int16   Int16   Int16   Int16   Int16   Int16    Int16    Int16    Int16    Int16   ⋯
─────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       1 │ 705630.0  5.75541e6       0       0       0       0       0       0       8      68     151      104      171      212      224      219 ⋯
       2 │ 705660.0  5.75541e6       0       0       0       0       0       0      12      43     131      175      191      222      265      272
       3 │ 705690.0  5.75541e6       0       0       0       0       0       0      23      66     166      210      190      209      288      254
       4 │ 705720.0  5.75541e6       0       0       0       0       0       0      17     136     180      210      205      237      247      254
       5 │ 705750.0  5.75541e6       0       0       0       0       0      30      39     128     189      198      196      197      305      256 ⋯
       6 │ 705780.0  5.75541e6       0       0       0       0       0       0      52      87     154      201      195      225      280      282
       7 │ 705810.0  5.75541e6       0       0       0       0       0      12      61     131     166      184      200      228      285      276
       8 │ 705840.0  5.75541e6       0       0       0       0       0       0      33      79     143      212      196      226      303      256
       9 │ 705870.0  5.75541e6       0       0       0       0       0       0      15      91     174      185      182      201      291      277 ⋯
    ⋮    │    ⋮          ⋮        ⋮       ⋮       ⋮       ⋮       ⋮       ⋮       ⋮       ⋮       ⋮        ⋮        ⋮        ⋮        ⋮        ⋮    ⋱
 1086334 │ 679890.0  5.72187e6       0       0       0       0       0      38     240     274     297      275      271      274      249      286
 1086335 │ 679920.0  5.72187e6       0       0       0       0       0      74     252     249     269      286      284      254      268      254
 1086336 │ 679950.0  5.72187e6       0       0       0       0       0     159     247     239     249      272      267      237      282      270
 1086337 │ 679980.0  5.72187e6       0       0       0       0       0     117     170     220     279      266      244      262      296      276 ⋯
 1086338 │ 680010.0  5.72187e6       0       0       0       0       0     111     139     195     260      211      201      219      269      215
 1086339 │ 680040.0  5.72187e6       0       0       0       0       0      43     149     216     234      204      174      196      263      199
 1086340 │ 679320.0  5.72184e6       0       0       0       0       0      81     129     171     246      265      259      200      233      225
 1086341 │ 679350.0  5.72184e6       0       0       0       0       0     153     228     241     238      246      213      234      259      252 ⋯
                                                                                                                 221 columns and 1086324 rows omitted

However, this behaviour does not seem to be documented anywhere in the official docs. Indeed, the only reason I know about this functionality is because I happened to stumble across this comment on the Julia Discourse. I feel that this feature is extremely useful for conducting further analysis of rasterized data and that it would be a good idea to advertise it more publicly. Perhaps we could add something to the docs?

rafaqz commented 1 year ago

Ok, that makes sense. I guess I need to handle these many-layered stack use cases. The problem is the current design is extremely fast for small stacks and generalising to large stacks will break that. So we will need a different stack type in DimensionslData.jl that you can choose to use for these workloads.

The Tables.jl interface is just kind of assumed, it could be better described.

If you would like to add documentation that would be very helpful, I dont have much time and it mostly goes to bug fixes and feature requests.

JoshuaBillson commented 1 year ago

Yes, I agree that breaking optimization for small stacks, which likely represent the majority of use cases, would be a bad idea.

I can think of two ways forward for this issue:

  1. Have methods detect when a stack contains more than a certain number of layers and dispatch a different implementation for those cases.
  2. Introduce a second data type for users working with many layers, which could either be a replacement for RasterStack (FatStack?) or a wrapper around a standard RasterStack.

One trick that I found was to break large stacks up into smaller chunks, process each chunk separately, then merge the results. I'm guessing that this avoids the issue with large NamedTuples.

Here's an example of a naive implementation that runs in 143.159 ms and produces 490833 allocations at 658.48 MB.

function _extract_signatures(stack, shp, row)
    return extract(stack, shp[row,:geometry]) |> collect
end

And this is an optimized example, which runs in 19.426 ms and produces 13608 allocations at 73.26 MiB.

function _extract_signatures(stack, shp, row)
    # Large Stacks
    if length(names(stack)) > 25 
        # Partition Stack Layers Into Chunks of 25
        stacks = [RasterStack([stack[c] for c in part]...) for part in Iterators.partition(names(stack), 25)]

        # Extract Signatures From Each Chunk
        ops = [_extract_signatures(stack, shp, row) for stack in stacks]

        # Merge Results
        return map(x -> reduce(merge, x), zip(ops...))

    # Small Stacks
    else
        return extract(stack, shp[row,:geometry]) |> collect
    end
end

The following method generalizes the above logic:

split_op_merge(stack, x -> map(a -> reduce(merge, a), zip(x...))) do x
    return extract(x, shp[row,:geometry]) |> collect
end
rafaqz commented 1 year ago

Yes totally, those are the two options.

  1. may be a little fragile, and could be broken by changes in Julia versions.

I think there are more a usecases for 2. in DimensionalData.jl, as it can also reduce compile time.

An idea I have is to make an MDimStack (where M stands for mutable), where you can modify both layers and dimensions and indexing is moderately fast in all cases, but of course much slower for small stacks than DimStack/RasterStack

Its like DataFrames.jl vs TypedTables.jl, where currently DimStack/RasterStack are more like TypedTables, and MDimStack would be like DataFrames.jl.

We would also make it easy to switch between them

JoshuaBillson commented 1 year ago

That sounds like the best idea. Do you think it's best to make users explicitly choose between the two, or are we planning on providing a common interface where one of MDimStack or RasterStack is automatically chosen behind the scenes? In the second case, I imagine something like a RasterStack constructor that uses an MDimStack for its underlying operations if the number of layers exceeds some threshold. Otherwise, it just uses the default RasterStack as its data representation.

rafaqz commented 1 year ago

Im not sure really, and "planning" might mean after my phd in a year unfortunately, unless someone else writes it...