rafaqz / Rasters.jl

Raster manipulation for the Julia language
MIT License
206 stars 36 forks source link

reading in a lazy DiskArrays.ConcatDiskArray is slow #717

Open tiemvanderdeure opened 3 weeks ago

tiemvanderdeure commented 3 weeks ago

MWE:

using Rasters, RasterDataSources
ser =  RasterSeries(WorldClim{Climate}, :tavg; month = 1:12, res = "2.5m", lazy = true) 
ras = Rasters.combine(ser; lazy = true)

@time ras[X = 1:100, Y = 1:100, month = 1] # 1.6 seconds!!
@time ser[1][X = 1:100, Y = 1:100] # 0.06 seconds

This must be some issue with chunks and getindex.

Rasters.DA.eachchunk(ser[1]) |> size # (1, 4320)
Rasters.DA.eachchunk(ras) |> size # (1, 1, 12)

All the time is being spent on gdalopen and rasterio! and gdalclose image

felixcremer commented 3 weeks ago

What is tas in your example?

tiemvanderdeure commented 3 weeks ago

Sorry, tas should be ser

rafaqz commented 3 weeks ago

Ok looks like some problem like reading every single cell separately.... it should just be one chunk looking at you eachchunk output. So the ConcatDiskArray isn't doing its job properly.

It might be fixed on DiskArrays main but we aren't using it yet because of NCDatasets

tiemvanderdeure commented 3 weeks ago

This is on DiskArrays v0.4.4. I don't have NCDatasets in the environment

rafaqz commented 3 weeks ago

Ok... so the problem could be that we need to special-case ConCatDiskArray and just open the files that it needs to read from? They wont be found by Flatten.flatten because theyre objects in an array.

I had assumed readblock would do that fine because each block is a whole file and it would only open once... but something is not working and we are reading into a FileArray many times instead of opening it once.