Open SorooshMani-NOAA opened 1 year ago
@FariborzDaneshvar-NOAA since you started exploring this item, can you please either link an existing ticket or just use this ticket to document your progress and impediments (like https://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/128)
With the stacking suggestion in https://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/129#issuecomment-1885667131, I was able to execute the subset_dataset()
function with stacked time&node! But the conversion of the KL surrogate model to the overall surrogate for each node
step (execution of surrogate_from_karhunen_loeve()
function) failed with MemoryError
!
One suggestion was using a chunk of time steps. Here I will provide updates on that regard.
Building surrogate model for the first 100 time steps:
time_chunk = elev_timeseries.sel(time=slice("2018-08-30T13:00:00.000000000", "2018-09-03T16:00:00.000000000"))
time_chunk_stack = time_chunk.rename(
nSCHISM_hgrid_node='node'
).stack(
stacked=('time','node'), create_index=False
).swap_dims(
stacked='node'
)
subset = subset_dataset(ds=time_chunk_stack, ...)
It went through and here are plots: | kL eigenvalues | KL fit |
---|---|---|
KL-surrogate fit | validation boxplots | |
sensitivities | model vs surrogate | |
This results look weird! and to me the KL fit didn't work correctly! One possibility is that the first 100 time steps used here are long before landfall and minimal variation might exist between them. It also reveals the issue in the plotting function I mentioned earlier here https://github.com/noaa-ocs-modeling/EnsemblePerturbation/issues/132
Despite these results, I couldn't make percentile and probability plots due to MemoryError
: Unable to allocate 1.15 TiB for an array with shape (15772912, 10000) and data type float64
I also tried opening subset.nc
with dask (chunk=auto
), but it didn't change the outcome of memory error (still getting the same message for percentile and probability plots!
But interestingly, the sensitivity plots for along-track were different! (see below) @SorooshMani-NOAA how that might be possible?!
@FariborzDaneshvar-NOAA about the memory issue, the problem is that in the function you showed me the other day it is calling numpy
function directly, which means it will get all values to memory and then executes the function (as far as I understand). So you need to also change the function where the numpy
method is called.
I'm not sure what is happening in the plots. Are you sure that mapping back to physical space is done correctly? Since we have a time-node dimension where neither times nor nodes are necessarily aligned, so we have to be very careful when reshaping.
I'm not sure if the plots we get are actually meaningful!
@SorooshMani-NOAA thanks for your comment, you brought up a good point about results! I didn't reshape it back to time/node, which might explain these plots, but it's not clear to me at which step it should be reshaped!
This new memory issue is different from what I mentioned before (for the numpy
function in the surrogate expansion, when I used the entire time step), but you are right, it should be addressed separately.
Currently only max water elevation is used to train the surrogate model. We'd like to consider the whole timeseries to see how it affects the surrogate output.
Tasks:
129
@saeed-moghimi-noaa @WPringle @SorooshMani-NOAA