Closed BG-AIMS closed 1 week ago
Quick comment about this: I don't know how much time I'm supposed to wait, but I'm trying to use ADRIA.metrics.loc_trajectory
with 64 scenarios and only Moore cluster locations and it never finishes running. That's actually a problem with ADRIA.metrics.summarize
.
(Update: it finished running after more than 10 mins).
But I also had a weird experience with mapslices
. If, inside summarize
I do
mapslices(metric, data; dims=(:scenarios,))
it works fine, but if I try
mapslices(metric, read(data); dims=(3,)
it also never finishes running.
The reason why I was thinking about doing the latter is because when I do the former, and data
is a YAXArray{Float64}
, the result is actually a YAXArray{Union{Float64,Missing}}
(although there is no missing values).
I don't know if the problem is with JuliennedArrays, but using mapslices
with YAXArary
s seems to work fine (the only thing is that we need to convert the result to the right type in the end, which is annoying).
The reason why I was thinking about doing the latter is because when I do the former, and
data
is aYAXArray{Float64}
, the result is actually aYAXArray{Union{Float64,Missing}}
(although there is no missing values).I don't know if the problem is with JuliennedArrays, but using
mapslices
withYAXArary
s seems to work fine (the only thing is that we need to convert the result to the right type in the end, which is annoying).
Yeah that is true for mapslices() and I'm not sure why. I have been doing Float64.(mapslices(function, YAXArray, dims=[:dim]))
to create YAXArray{Float64} output with one line.
I wonder if it was the change to YAXArray
that caused the performance regression - I don't recall having an issue.
That said, one of the reasons why JuliennedArrays
was being used, if I remember correctly, was to lazily process higher dimensional data (e.g., 4 or 5).
I think mapslices
can do this as well, but just wanted to confirm that this is true?
I wonder if it was the change to
YAXArray
that caused the performance regression - I don't recall having an issue.
Actually, I think you might be right. The current approach (that is not working) is:
data_slices = JuliennedArrays.Slices(data.data, alongs...)
summarized_data = map(metric, data_slices)
But if we change to:
data_slices = JuliennedArrays.Slices(read(data), alongs...)
summarized_data = map(metric, data_slices)
it's actually waaaay better than the mapslices
approach !
mapslices with YAXArarys seems to work fine (the only thing is that we need to convert the result to the right type in the end, which is annoying).
There's a mapslices
method defined for YAXArray types, so likely it's optimised.
https://juliadatacubes.github.io/YAXArrays.jl/dev/UserGuide/compute.html#mapslices
it's actually waaaay better than the mapslices approach !
Hmmmmm.... the issue then is if we happen to be working with a very large dataset. It might take longer (or crash with out of memory error) because you're reading the entire dataset into memory.
Is this likely ever an issue?
An alternative is to do make the code smartly switch between read()
+ map
or just mapslices
- get the estimated data size from the YAXArray and read()
if it's below available memory, or use mapslices
on data.data
if not...
Currently the
ADRIA.loc_trajectory()
function takes a long amount of time to run, particularly for GBR wide analyses. Possible speed up method is to usemapslices()
function and specify the dimension to be reduced (:scenarios
).