pik-piam / remind2

The remind2 package contains the REMIND-specific routines for data and model output manipulation.
0 stars 40 forks source link

memory and mbind problems in convGDX2MIF / reportFE #549

Closed orichters closed 4 months ago

orichters commented 6 months ago

Some code to start with:

library(magclass)
t <- c(1995, 2005, 2015)
s <- c('AFR', 'CPA')
A <- dimSums(maxample('pop')[s,t,'A2'], 3)
B <- dimSums(maxample('pop')[s,t[-2],'B1'], 3)

The question is how to be able to sum, mbind etc. A and B.

The problem is that the dimensions each variable is defined (not declared) over in the GAMS code (e.g. t instead of tall, se2fe(entySE,entyFE,te) instead of all all_enty/all_enty/all_te combinations) is not pre-defined, and the problem depends on which other variables the variables is combined with (in magpie operations) and what data those variables have.

orichters commented 6 months ago
  1. code suggestion by @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q to expand a to the joint size of a and b:

    somebody_name_this_function <- function(a, b, fill = 0) {
    r <- new.magpie(cells_and_regions = union(getRegions(a), getRegions(b)),
                    years = sort(union(getYears(a, as.integer = TRUE), 
                                       getYears(b, as.integer = TRUE))),
                    names = union(getNames(a), getNames(b)),
                    fill = fill,
                    sets = names(dimnames(b)))
    
    r[getRegions(a),getYears(a),getNames(a)] <- a
    return(r)
    }
    B_ <- B[,t[-2],] %>% somebody_name_this_function(A)
    A + B_

    which returns:

    An object of class "magpie"
    , , 1
     t
    i        y1995   y2005   y2015
    AFR 1105.333  696.44 1821.22
    CPA 2561.270 1429.53 3018.20
> system.time({
+     x <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+                                             field = "l", restore_zeros = TRUE,
+                                             react = "silent")[,t,]})
   user  system elapsed 
 12.134   4.380  16.553 
> format(object.size(x), units = 'auto')
[1] "1.5 Gb"
> system.time({
+     ref <- new.magpie(
+         cells_and_regions = readGDX(gdx, 'regi'),
+         years = readGDX(gdx, 't'),
+         names = quitte::cartesian(
+             readGDX(gdx, 'se2fe') %>% 
+                 mutate(x = paste(.data$all_enty, .data$all_enty1,
+                                  sep = '.')) %>% 
+                 pull('x'),
+             'indst',
+             readGDX(gdx, 'all_emiMkt')),
+         fill = 0,
+         sets = c('t', 'regi', 'entySE', 'entyFE', 'sector', 'emiMkt'))
+     
+     
+     
+     y <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+                                             field = "l", restore_zeros = FALSE,
+                                             react = "silent") %>% 
+         somebody_name_this_function(ref)})
   user  system elapsed 
  0.133   0.000   0.129 
> format(object.size(y), units = 'auto')
[1] "257.8 Kb"
orichters commented 6 months ago

I rather think of a one-sided function that restricts and expands x to the size of ref:

somebody_name_this_function <- function(x, ref, fill = 0) {
  r <- new.magpie(cells_and_regions = getRegions(ref),
                  years = getYears(ref),
                  names = getNames(x),
                  fill = fill,
                  sets = names(dimnames(ref)))
  r[intersect(getRegions(x), getRegions(r)), intersect(getYears(x), getYears(r)), getNames(x)] <- x
  return(r)
}
0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

"Nicht mein Zirkus, nicht meine Affen."

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago
region <- c("AAA", "BBB", "CCC")
t      <- paste0("y", c(2000, 2005, 2010))
name   <- c("foo", "bar", "bazz")

A <- new.magpie(cells_and_regions = region, years = t, names = name, fill = 1,
                sets = c("region", "t", "name"))

# spatial ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[region[-2],,], A),
             `mselect<-`(A, region = region[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[region[-2],,]),
             A[region[-2],,])

# temporal ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,t[-2],], A),
             `mselect<-`(A, t = t[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(x = A, ref = A[,t[-2],]),
             A[,t[-2],])

# names ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,,name[-2]], A),
             `mselect<-`(A, name = name[2], value = 0))

## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[,,name[-2]]),
             A[,,name[-2]])
0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

I found a (somewhat) reliable way to assess the size of the problem.

/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = FALSE)), units = "auto")' 2>&1 | grep -E '(\[|Maximum resident set size)'
[1] "66 Kb"
    Maximum resident set size (kbytes): 131152
/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = TRUE)), units = "auto")' 2>&1 | grep 
-E '(\[|Maximum resident set size)'
[1] "3 Gb"
    Maximum resident set size (kbytes): 9496012

So the returned object goes from 66 kb to 3 GB (with filtering for t it is 1.5 GB I believe), while peak memory demand increases by 9.3 GB.

Peak memory demand of convGDX2MIF() seems to be 16 GB currently:

$ sbatch --wrap "/usr/bin/time -v Rscript -e 'x <- remind2::convGDX2MIF(\"/p/projects/remind/modeltests/remind/output/SSP2EU-
PkBudg1050-AMT_2024-02-24_01.16.37/fulldata.gdx\")' 2>&1 | grep 'Maximum resident set size'" --output=foo.log --qos="priority" --mem=480
00 --mail-type=END
$ cat foo.log 
    Maximum resident set size (kbytes): 15919132
orichters commented 6 months ago

Peak memory demand of convGDX2MIF() seems to be 16 GB currently:

That fits my experience that using --mem=16000 was sufficient.

fbenke-pik commented 6 months ago

Will work on this next week (KW10)

fbenke-pik commented 6 months ago

I rather think of a one-sided function that restricts and expands x to the size of ref:

Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should

fbenke-pik commented 6 months ago

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q could you point me to gdxes with irregularities that could make reportEmi and reportFE crash if I don't restore zeros?

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

Nope. @mellamoSimon worked on this, I only surmised why he did what he did. I have no idea if that is a problem that crops up regularly, or if he encountered it at all, or if he just did as the Romans do. But in theory you should be able to test this with every instance of readGDX(restore_zeros = TRUE):

$ ag 'restore_zeros *= *T'
R/reportFE.R
82:  vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector", field = "l", restore_zeros = T, react = "silent")[,t,]*TWa_2_EJ

R/reportSE.R
77:    storLoss <- readGDX(gdx, name = c("v32_storloss", "v_storloss"), field = "l", restore_zeros = TRUE, format = "first_found") * pm_conv_TWa_EJ
434:  pm_fuExtrOwnCons <- readGDX(gdx, "pm_fuExtrOwnCons", restore_zeros = T)[,,getNames(pm_fuExtrOwnCons_reduced)]

R/reportPrices.R
991:  o01_CESderivatives <- readGDX(gdx, "o01_CESderivatives", restore_zeros = T, react = "silent")

R/reportEmi.R
149:  # note: this has to be read in with restore_zeros=T because sometimes it contains only non-zero values for "ETS" emiMkt
151:  pm_IndstCO2Captured <- readGDX(gdx, "pm_IndstCO2Captured", restore_zeros = T,
196:  vm_emiIndCCS <- readGDX(gdx, "vm_emiIndCCS", field = "l", restore_zeros = T)[, t, ]
332:  vm_plasticsCarbon <- readGDX(gdx, "vm_plasticsCarbon", field = "l", restore_zeros = T, react = "silent")[,t,]
338:    vm_feedstockEmiUnknownFate  <- readGDX(gdx, "vm_feedstockEmiUnknownFate", field = "l", restore_zeros = T, react = "silent")[,t,]
339:    vm_incinerationEmi          <- readGDX(gdx, "vm_incinerationEmi", field = "l", restore_zeros = T, react = "silent")[,t,]
340:    vm_nonIncineratedPlastics   <- readGDX(gdx, "vm_nonIncineratedPlastics", field = "l", restore_zeros = T, react = "silent")[,t,]
2704:  p47_LULUCFEmi_GrassiShift <- readGDX(gdx, "p47_LULUCFEmi_GrassiShift", restore_zeros = T, react = "silent")[getRegions(out), getYears(out),]
mellamoSimon commented 6 months ago

hey, yeah, that's a good assumption, I eventually set restore_zeros = T because the script would fail otherwise. I did that long ago though so I don't recall the exact mismatch of dimensions that would happen. I can try to reproduce it and let you know? @fbenke-pik

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 6 months ago

Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should

  • expand x to ref, but not restrict it to ref

But the use case in remind2 does always include also a restriction. All the examples above have something like [,t,] attached to the readGDX() call (except for storLoss, which does it in a funky way a couple of lines below). I agree with @orichters on this one.

fbenke-pik commented 6 months ago

Thanks Simón. Right now I favour the following approach:

Let me know in case there are objections. If not, then nothing to do on your end, Simón.

mellamoSimon commented 6 months ago

Hi Falk, that sounds good, thank you!

fbenke-pik commented 5 months ago

Looks like this PR should solve the problems: https://github.com/pik-piam/remind2/pull/557

Could solve the problems mostly by introducing new options to readGDX (https://github.com/pik-piam/gdx/pull/8).

As of now, I don't see a need for somebody_name_this_function.

Validation:

fbenke-pik commented 5 months ago

Leaving this open, as it will probably become relevant in the future again. (see https://github.com/pik-piam/remind2/issues/524)

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 5 months ago

The https://github.com/remindmodel/remind/pull/1609 problem comes down to the input.gdx, which is from the current set of CES parameters sports a whole lot of zeros for SE/FE technologies that do not actually exist.

readGDX(gdx, 'vm_demFENonEnergySector', field = 'l', restore_zeros = FALSE)[,t,] %>%
    magclass_to_tibble() %>% 
    anti_join(readGDX(gdx, 'sefe')) %>% 
    filter(!is.na(value)) %>% 
    distinct(all_enty, all_enty1, .keep_all = TRUE)
Joining with `by = join_by(all_enty, all_enty1)`
# A tibble: 19 × 7
   all_regi  ttot all_enty all_enty1 emi_sectors all_emiMkt value
   <chr>    <int> <chr>    <chr>     <chr>       <chr>      <dbl>
 1 EUR       2005 seliqbio fegas     indst       ETS            0
 2 EUR       2005 seliqbio fesos     indst       ETS            0
 3 EUR       2005 seliqsyn fegas     indst       ETS            0
 4 EUR       2005 seliqsyn fesos     indst       ETS            0
 5 EUR       2005 sesobio  fegas     indst       ETS            0
 6 EUR       2005 sesobio  fehos     indst       ETS            0
 7 EUR       2005 seel     fegas     indst       ETS            0
 8 EUR       2005 seel     fehos     indst       ETS            0
 9 EUR       2005 seel     fesos     indst       ETS            0
10 EUR       2005 seh2     fegas     indst       ETS            0
11 EUR       2005 seh2     fehos     indst       ETS            0
12 EUR       2005 seh2     fesos     indst       ETS            0
13 EUR       2005 segabio  fehos     indst       ETS            0
14 EUR       2005 segabio  fesos     indst       ETS            0
15 EUR       2005 segasyn  fehos     indst       ETS            0
16 EUR       2005 segasyn  fesos     indst       ETS            0
17 EUR       2005 sehe     fegas     indst       ETS            0
18 EUR       2005 sehe     fehos     indst       ETS            0
19 EUR       2005 sehe     fesos     indst       ETS            0

Probably some REMIND/GAMS carry-over.

ricardarosemann commented 5 months ago

I also came across the issue of the non-existing SE/FE technologies today, when trying to do reporting for a Remind Base run, which failed due to the vm_demFeSector/vm_demFENonEnergySector mismatch. Falk's merge fortunately fixed that. I tried to trace down the origin of this error and I would guess that the non-existing SE/FE technologies might come from (in Remind)

q37_FossilFeedstock_Base(t,regi,entyFe,emiMkt)$(
                         entyFE2sector2emiMkt_NonEn(entyFe,"indst",emiMkt)
                     AND cm_emiscen eq 1                                   ) ..
  sum(entySe, vm_demFENonEnergySector(t,regi,entySe,entyFe,"indst",emiMkt))
  =e=
  sum(entySeFos,
    vm_demFENonEnergySector(t,regi,entySeFos,entyFe,"indst",emiMkt)
  )
;

as there is no sefe mapping active here (and this equation is only active for Base runs). Maybe it's worth looking into this.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 5 months ago

Thanks Ricarda. @orichters, was https://github.com/remindmodel/remind/pull/1609 a Base run?

orichters commented 5 months ago

The error that occured? That was in a testOneRegi.

ricardarosemann commented 5 months ago

testOneRegi runs should also be subject to the equation I posted - as it also has cm_emiscen = 1 (the default).

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 5 months ago

No, you can run testOneRegi for any configuration you want. Baseline, NPi, Policy, calibration. q37_FossilFeedstock_Base is standing next to the Guillotine anyhow, so …

orichters commented 5 months ago

Well, it was a testOneRegi as called by make test, so with default settings.

fbenke-pik commented 4 months ago

Closing, as this seems to be stable now, not causing any further issues.