memory and mbind problems in convGDX2MIF / reportFE

orichters commented 6 months ago

Running output.R -> single -> reporting.R runs out of memory with standard --mem=8000 setting, even --mem=12000 fails.
Reason seems to be that reportFE requires more memory
Using output.R -> single -> reporting, @orichters started output generation for several AMT testOneRegi runs. Result: Until 2023-12-22, it works, afterwards, it fails, see cd /p/projects/remind/modeltests/remind/output; grep oom-kill testOne*/log_output.txt

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q found out it is gdx::readGDX(), triggered by vm_demFENonEnergySector:

> foo <- gdx::readGDX('./output/testOneRegi-Base-AMT_2024-02-16_22.08.20/fulldata.gdx', 'vm_demFENonEnergySector', field = 'l', restore_zeros = TRUE, react = 'silent')
> format(object.size(foo), units = 'auto')
[1] "3 Gb"
> bar <- gdx::readGDX('./output/testOneRegi-Base-AMT_2024-02-16_22.08.20/fulldata.gdx', 'vm_demFENonEnergySector', field = 'l', restore_zeros = FALSE, react = 'silent')
> format(object.size(bar), units = 'auto')
[1] "138.9 Kb"

According to @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q, one cannot simply set restore_zeros = FALSE because we declare all variables over the general supersets (e.g., tall, all_enty), but usually define them only over subsets (t, entyFE). restore_zeros = FALSE reads in only data that is not zero, which is usually what we want. But if some axis of the data (e.g. all values in 2010, or everything fehos) happens to be entirely zero (usually that happens for specific gdxes only), that dimension of the resulting magpie object differs (e.g. c(2005, 2015, …) instead of c(2005, 2010, 2015, …)) and it is incompatible with all the rest of the objects we have. What would be needed is a function to expands a variable read in with restore_zeros = FALSE to the actual sets it should have. But that is a moving target, as it depends on all the other variables/parameters used in conjunction with it.
magpie_expand is not the solution

Some code to start with:

library(magclass)
t <- c(1995, 2005, 2015)
s <- c('AFR', 'CPA')
A <- dimSums(maxample('pop')[s,t,'A2'], 3)
B <- dimSums(maxample('pop')[s,t[-2],'B1'], 3)

The question is how to be able to sum, mbind etc. A and B.

The problem is that the dimensions each variable is defined (not declared) over in the GAMS code (e.g. t instead of tall, se2fe(entySE,entyFE,te) instead of all all_enty/all_enty/all_te combinations) is not pre-defined, and the problem depends on which other variables the variables is combined with (in magpie operations) and what data those variables have.

orichters commented 6 months ago

code suggestion by @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q to expand a to the joint size of a and b:

somebody_name_this_function <- function(a, b, fill = 0) {
r <- new.magpie(cells_and_regions = union(getRegions(a), getRegions(b)),
                years = sort(union(getYears(a, as.integer = TRUE), 
                                   getYears(b, as.integer = TRUE))),
                names = union(getNames(a), getNames(b)),
                fill = fill,
                sets = names(dimnames(b)))

r[getRegions(a),getYears(a),getNames(a)] <- a
return(r)
}
B_ <- B[,t[-2],] %>% somebody_name_this_function(A)
A + B_

which returns:

An object of class "magpie"
, , 1
 t
i        y1995   y2005   y2015
AFR 1105.333  696.44 1821.22
CPA 2561.270 1429.53 3018.20

> system.time({
+     x <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+                                             field = "l", restore_zeros = TRUE,
+                                             react = "silent")[,t,]})
   user  system elapsed 
 12.134   4.380  16.553 
> format(object.size(x), units = 'auto')
[1] "1.5 Gb"

> system.time({
+     ref <- new.magpie(
+         cells_and_regions = readGDX(gdx, 'regi'),
+         years = readGDX(gdx, 't'),
+         names = quitte::cartesian(
+             readGDX(gdx, 'se2fe') %>% 
+                 mutate(x = paste(.data$all_enty, .data$all_enty1,
+                                  sep = '.')) %>% 
+                 pull('x'),
+             'indst',
+             readGDX(gdx, 'all_emiMkt')),
+         fill = 0,
+         sets = c('t', 'regi', 'entySE', 'entyFE', 'sector', 'emiMkt'))
+     
+     
+     
+     y <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+                                             field = "l", restore_zeros = FALSE,
+                                             react = "silent") %>% 
+         somebody_name_this_function(ref)})
   user  system elapsed 
  0.133   0.000   0.129 
> format(object.size(y), units = 'auto')
[1] "257.8 Kb"

orichters commented 6 months ago

I rather think of a one-sided function that restricts and expands x to the size of ref:

somebody_name_this_function <- function(x, ref, fill = 0) {
  r <- new.magpie(cells_and_regions = getRegions(ref),
                  years = getYears(ref),
                  names = getNames(x),
                  fill = fill,
                  sets = names(dimnames(ref)))
  r[intersect(getRegions(x), getRegions(r)), intersect(getYears(x), getYears(r)), getNames(x)] <- x
  return(r)
}

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

"Nicht mein Zirkus, nicht meine Affen."

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

region <- c("AAA", "BBB", "CCC")
t      <- paste0("y", c(2000, 2005, 2010))
name   <- c("foo", "bar", "bazz")

A <- new.magpie(cells_and_regions = region, years = t, names = name, fill = 1,
                sets = c("region", "t", "name"))

# spatial ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[region[-2],,], A),
             `mselect<-`(A, region = region[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[region[-2],,]),
             A[region[-2],,])

# temporal ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,t[-2],], A),
             `mselect<-`(A, t = t[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(x = A, ref = A[,t[-2],]),
             A[,t[-2],])

# names ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,,name[-2]], A),
             `mselect<-`(A, name = name[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[,,name[-2]]),
             A[,,name[-2]])

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

I found a (somewhat) reliable way to assess the size of the problem.

/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = FALSE)), units = "auto")' 2>&1 | grep -E '(\[|Maximum resident set size)'
[1] "66 Kb"
    Maximum resident set size (kbytes): 131152
/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = TRUE)), units = "auto")' 2>&1 | grep 
-E '(\[|Maximum resident set size)'
[1] "3 Gb"
    Maximum resident set size (kbytes): 9496012

So the returned object goes from 66 kb to 3 GB (with filtering for t it is 1.5 GB I believe), while peak memory demand increases by 9.3 GB.

Peak memory demand of convGDX2MIF() seems to be 16 GB currently:

$ sbatch --wrap "/usr/bin/time -v Rscript -e 'x <- remind2::convGDX2MIF(\"/p/projects/remind/modeltests/remind/output/SSP2EU-
PkBudg1050-AMT_2024-02-24_01.16.37/fulldata.gdx\")' 2>&1 | grep 'Maximum resident set size'" --output=foo.log --qos="priority" --mem=480
00 --mail-type=END
$ cat foo.log 
    Maximum resident set size (kbytes): 15919132

orichters commented 6 months ago

Peak memory demand of convGDX2MIF() seems to be 16 GB currently:

That fits my experience that using --mem=16000 was sufficient.

fbenke-pik commented 6 months ago

Will work on this next week (KW10)

fbenke-pik commented 6 months ago

I rather think of a one-sided function that restricts and expands x to the size of ref:

Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should

expand x to ref, but not restrict it to ref
~~only be applied to dim 1 and 2 (region and year)~~ Nvm, thinking about it more, I guess it won't hurt to expand to dim 3

fbenke-pik commented 6 months ago

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q could you point me to gdxes with irregularities that could make reportEmi and reportFE crash if I don't restore zeros?

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

Nope. @mellamoSimon worked on this, I only surmised why he did what he did. I have no idea if that is a problem that crops up regularly, or if he encountered it at all, or if he just did as the Romans do. But in theory you should be able to test this with every instance of readGDX(restore_zeros = TRUE):

$ ag 'restore_zeros *= *T'
R/reportFE.R
82:  vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector", field = "l", restore_zeros = T, react = "silent")[,t,]*TWa_2_EJ

R/reportSE.R
77:    storLoss <- readGDX(gdx, name = c("v32_storloss", "v_storloss"), field = "l", restore_zeros = TRUE, format = "first_found") * pm_conv_TWa_EJ
434:  pm_fuExtrOwnCons <- readGDX(gdx, "pm_fuExtrOwnCons", restore_zeros = T)[,,getNames(pm_fuExtrOwnCons_reduced)]

R/reportPrices.R
991:  o01_CESderivatives <- readGDX(gdx, "o01_CESderivatives", restore_zeros = T, react = "silent")

R/reportEmi.R
149:  # note: this has to be read in with restore_zeros=T because sometimes it contains only non-zero values for "ETS" emiMkt
151:  pm_IndstCO2Captured <- readGDX(gdx, "pm_IndstCO2Captured", restore_zeros = T,
196:  vm_emiIndCCS <- readGDX(gdx, "vm_emiIndCCS", field = "l", restore_zeros = T)[, t, ]
332:  vm_plasticsCarbon <- readGDX(gdx, "vm_plasticsCarbon", field = "l", restore_zeros = T, react = "silent")[,t,]
338:    vm_feedstockEmiUnknownFate  <- readGDX(gdx, "vm_feedstockEmiUnknownFate", field = "l", restore_zeros = T, react = "silent")[,t,]
339:    vm_incinerationEmi          <- readGDX(gdx, "vm_incinerationEmi", field = "l", restore_zeros = T, react = "silent")[,t,]
340:    vm_nonIncineratedPlastics   <- readGDX(gdx, "vm_nonIncineratedPlastics", field = "l", restore_zeros = T, react = "silent")[,t,]
2704:  p47_LULUCFEmi_GrassiShift <- readGDX(gdx, "p47_LULUCFEmi_GrassiShift", restore_zeros = T, react = "silent")[getRegions(out), getYears(out),]

mellamoSimon commented 6 months ago

hey, yeah, that's a good assumption, I eventually set restore_zeros = T because the script would fail otherwise. I did that long ago though so I don't recall the exact mismatch of dimensions that would happen. I can try to reproduce it and let you know? @fbenke-pik

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should

expand x to ref, but not restrict it to ref

But the use case in remind2 does always include also a restriction. All the examples above have something like [,t,] attached to the readGDX() call (except for storLoss, which does it in a funky way a couple of lines below). I agree with @orichters on this one.

fbenke-pik commented 6 months ago

Thanks Simón. Right now I favour the following approach:

Avoid restoring zeros whenever I see no immediate need to do so
Run adjusted reporting on all the latest AMT gdxes
Deal with the crashes coming out of this individually
Merge and keep an eye on the reporting

Let me know in case there are objections. If not, then nothing to do on your end, Simón.

mellamoSimon commented 6 months ago

Hi Falk, that sounds good, thank you!

fbenke-pik commented 5 months ago

Looks like this PR should solve the problems: https://github.com/pik-piam/remind2/pull/557

Could solve the problems mostly by introducing new options to readGDX (https://github.com/pik-piam/gdx/pull/8).

As of now, I don't see a need for somebody_name_this_function.

Validation:

I reran validation on the latest REMIND runs that actually produced a gdx /p/tmp/benke/validation
They all succeeded with mem=8000
Summation problems seem to be the same as /p/tmp/benke/validation/validate_remind2-29209181.out
Some gaps have gotten smaller actually (does not have to be due to this PR though, but other changes since original reporting ran), none have increased from what I can see

fbenke-pik commented 5 months ago

Leaving this open, as it will probably become relevant in the future again. (see https://github.com/pik-piam/remind2/issues/524)

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 5 months ago

The https://github.com/remindmodel/remind/pull/1609 problem comes down to the input.gdx, which is from the current set of CES parameters sports a whole lot of zeros for SE/FE technologies that do not actually exist.

readGDX(gdx, 'vm_demFENonEnergySector', field = 'l', restore_zeros = FALSE)[,t,] %>%
    magclass_to_tibble() %>% 
    anti_join(readGDX(gdx, 'sefe')) %>% 
    filter(!is.na(value)) %>% 
    distinct(all_enty, all_enty1, .keep_all = TRUE)
Joining with `by = join_by(all_enty, all_enty1)`
# A tibble: 19 × 7
   all_regi  ttot all_enty all_enty1 emi_sectors all_emiMkt value
   <chr>    <int> <chr>    <chr>     <chr>       <chr>      <dbl>
 1 EUR       2005 seliqbio fegas     indst       ETS            0
 2 EUR       2005 seliqbio fesos     indst       ETS            0
 3 EUR       2005 seliqsyn fegas     indst       ETS            0
 4 EUR       2005 seliqsyn fesos     indst       ETS            0
 5 EUR       2005 sesobio  fegas     indst       ETS            0
 6 EUR       2005 sesobio  fehos     indst       ETS            0
 7 EUR       2005 seel     fegas     indst       ETS            0
 8 EUR       2005 seel     fehos     indst       ETS            0
 9 EUR       2005 seel     fesos     indst       ETS            0
10 EUR       2005 seh2     fegas     indst       ETS            0
11 EUR       2005 seh2     fehos     indst       ETS            0
12 EUR       2005 seh2     fesos     indst       ETS            0
13 EUR       2005 segabio  fehos     indst       ETS            0
14 EUR       2005 segabio  fesos     indst       ETS            0
15 EUR       2005 segasyn  fehos     indst       ETS            0
16 EUR       2005 segasyn  fesos     indst       ETS            0
17 EUR       2005 sehe     fegas     indst       ETS            0
18 EUR       2005 sehe     fehos     indst       ETS            0
19 EUR       2005 sehe     fesos     indst       ETS            0

Probably some REMIND/GAMS carry-over.

ricardarosemann commented 5 months ago

I also came across the issue of the non-existing SE/FE technologies today, when trying to do reporting for a Remind Base run, which failed due to the vm_demFeSector/vm_demFENonEnergySector mismatch. Falk's merge fortunately fixed that. I tried to trace down the origin of this error and I would guess that the non-existing SE/FE technologies might come from (in Remind)

q37_FossilFeedstock_Base(t,regi,entyFe,emiMkt)$(
                         entyFE2sector2emiMkt_NonEn(entyFe,"indst",emiMkt)
                     AND cm_emiscen eq 1                                   ) ..
  sum(entySe, vm_demFENonEnergySector(t,regi,entySe,entyFe,"indst",emiMkt))
  =e=
  sum(entySeFos,
    vm_demFENonEnergySector(t,regi,entySeFos,entyFe,"indst",emiMkt)
  )
;

as there is no sefe mapping active here (and this equation is only active for Base runs). Maybe it's worth looking into this.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 5 months ago

Thanks Ricarda. @orichters, was https://github.com/remindmodel/remind/pull/1609 a Base run?

orichters commented 5 months ago

The error that occured? That was in a testOneRegi.

ricardarosemann commented 5 months ago

testOneRegi runs should also be subject to the equation I posted - as it also has cm_emiscen = 1 (the default).

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 5 months ago

No, you can run testOneRegi for any configuration you want. Baseline, NPi, Policy, calibration. q37_FossilFeedstock_Base is standing next to the Guillotine anyhow, so …

orichters commented 5 months ago

Well, it was a testOneRegi as called by make test, so with default settings.

fbenke-pik commented 4 months ago

Closing, as this seems to be stable now, not causing any further issues.

pik-piam / remind2

memory and mbind problems in convGDX2MIF / reportFE #549