meggart / DiskArrays.jl

Other
73 stars 13 forks source link

Error in plot of `DiskArray` after `cat` #185

Open danlooo opened 3 weeks ago

danlooo commented 3 weeks ago

Plotting of a concatenated YAXArray returns a wrong pattern in its plot. However, everything is fine on in-memory arrays and using comparison operations (e.g. all(a .== b)).

Tested on Julia 1.10.4 with YAXArrays v0.5.10, DimensionalData v0.27.7

using DimensionalData
using YAXArrays
using Zarr
using GLMakie

lon_range = X(-180:180)
lat_range = Y(-90:90)
data = [exp(cosd(lon)) + 3 * (lat / 90) for lon in lon_range, lat in lat_range]
a = YAXArray((lon_range, lat_range), data)
ds_ram = Dataset(; properties=Dict(), a)
path = tempname()
savedataset(ds_ram; path=path)
ds_disk = open_dataset(path)
a_ram = cat(ds_ram.a[X=1:100], ds_ram.a[X=101:200], dims=:X)
a_disk = cat(ds_disk.a[X=1:100], ds_disk.a[X=101:200], dims=:X)

all(a_disk .== a_ram) # ok
heatmap(a_ram) # ok
heatmap(a_disk) # bug
heatmap(collect(a_disk)) # ok

image

rafaqz commented 3 weeks ago

Maybe it's because DD is permuting the dims or something in the plot recipe, and something breaks ConcatatDiskArray inside other wrapper disk arrays?

Probably editing the DD heatmap implementation to show the array type and store it to a global will help debugging

rafaqz commented 3 weeks ago

So this may be the actual bug:

julia> collect(x for x in a_disk) == collect(x for x in a_ram)
false

Generators are broken on the ConCatDiskArray when its wrapped in a YAXArray. It could be from a map call as well as an explicit generator - I cant find exactly where its happening.

But this works:

julia> collect(x for x in parent(a_disk)) == collect(x for x in a_ram)
true

Probably DimensionalData should collect arrays before passing them to Makie as a failsafe, but this also needs to be fixed here in DiskArrays.

rafaqz commented 3 weeks ago

This is the fix if someone wants to PR it to YAX:

Base.Generator(f, A::YAXArray) = Base.Generator(f, parent(A))

And we need the same in Rasters.jl.

Unfortunately this is going to clash with the DimensionalData.jl Generator behavior, which in a lot of cases will return a DimArray because of the DimUnitRanges returned from axes.