Closed orichters closed 4 months ago
code suggestion by @0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q to expand a
to the joint size of a
and b
:
somebody_name_this_function <- function(a, b, fill = 0) {
r <- new.magpie(cells_and_regions = union(getRegions(a), getRegions(b)),
years = sort(union(getYears(a, as.integer = TRUE),
getYears(b, as.integer = TRUE))),
names = union(getNames(a), getNames(b)),
fill = fill,
sets = names(dimnames(b)))
r[getRegions(a),getYears(a),getNames(a)] <- a
return(r)
}
B_ <- B[,t[-2],] %>% somebody_name_this_function(A)
A + B_
which returns:
An object of class "magpie"
, , 1
t
i y1995 y2005 y2015
AFR 1105.333 696.44 1821.22
CPA 2561.270 1429.53 3018.20
> system.time({
+ x <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+ field = "l", restore_zeros = TRUE,
+ react = "silent")[,t,]})
user system elapsed
12.134 4.380 16.553
> format(object.size(x), units = 'auto')
[1] "1.5 Gb"
> system.time({
+ ref <- new.magpie(
+ cells_and_regions = readGDX(gdx, 'regi'),
+ years = readGDX(gdx, 't'),
+ names = quitte::cartesian(
+ readGDX(gdx, 'se2fe') %>%
+ mutate(x = paste(.data$all_enty, .data$all_enty1,
+ sep = '.')) %>%
+ pull('x'),
+ 'indst',
+ readGDX(gdx, 'all_emiMkt')),
+ fill = 0,
+ sets = c('t', 'regi', 'entySE', 'entyFE', 'sector', 'emiMkt'))
+
+
+
+ y <- vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector",
+ field = "l", restore_zeros = FALSE,
+ react = "silent") %>%
+ somebody_name_this_function(ref)})
user system elapsed
0.133 0.000 0.129
> format(object.size(y), units = 'auto')
[1] "257.8 Kb"
I rather think of a one-sided function that restricts and expands x
to the size of ref
:
somebody_name_this_function <- function(x, ref, fill = 0) {
r <- new.magpie(cells_and_regions = getRegions(ref),
years = getYears(ref),
names = getNames(x),
fill = fill,
sets = names(dimnames(ref)))
r[intersect(getRegions(x), getRegions(r)), intersect(getYears(x), getYears(r)), getNames(x)] <- x
return(r)
}
"Nicht mein Zirkus, nicht meine Affen."
region <- c("AAA", "BBB", "CCC")
t <- paste0("y", c(2000, 2005, 2010))
name <- c("foo", "bar", "bazz")
A <- new.magpie(cells_and_regions = region, years = t, names = name, fill = 1,
sets = c("region", "t", "name"))
# spatial ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[region[-2],,], A),
`mselect<-`(A, region = region[2], value = 0))
## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[region[-2],,]),
A[region[-2],,])
# temporal ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,t[-2],], A),
`mselect<-`(A, t = t[2], value = 0))
## x larger then ref ----
expect_equal(somebody_name_this_function(x = A, ref = A[,t[-2],]),
A[,t[-2],])
# names ----
## x smaller then ref ----
expect_equal(somebody_name_this_function(A[,,name[-2]], A),
`mselect<-`(A, name = name[2], value = 0))
## x larger then ref ----
expect_equal(somebody_name_this_function(A, A[,,name[-2]]),
A[,,name[-2]])
I found a (somewhat) reliable way to assess the size of the problem.
/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = FALSE)), units = "auto")' 2>&1 | grep -E '(\[|Maximum resident set size)'
[1] "66 Kb"
Maximum resident set size (kbytes): 131152
/p/projects/remind/modeltests/remind/output/SSP2EU-PkBudg1050-AMT_2024-02-24_01.16.37 (develop|?? M)$ /usr/bin/time -v Rscript -e 'format(object.size(gdx::readGDX("fulldata.gdx", "vm_demFENonEnergySector", field = "l", restore_zeros = TRUE)), units = "auto")' 2>&1 | grep
-E '(\[|Maximum resident set size)'
[1] "3 Gb"
Maximum resident set size (kbytes): 9496012
So the returned object goes from 66 kb to 3 GB (with filtering for t
it is 1.5 GB I believe), while peak memory demand increases by 9.3 GB.
Peak memory demand of convGDX2MIF()
seems to be 16 GB currently:
$ sbatch --wrap "/usr/bin/time -v Rscript -e 'x <- remind2::convGDX2MIF(\"/p/projects/remind/modeltests/remind/output/SSP2EU-
PkBudg1050-AMT_2024-02-24_01.16.37/fulldata.gdx\")' 2>&1 | grep 'Maximum resident set size'" --output=foo.log --qos="priority" --mem=480
00 --mail-type=END
$ cat foo.log
Maximum resident set size (kbytes): 15919132
Peak memory demand of
convGDX2MIF()
seems to be 16 GB currently:
That fits my experience that using --mem=16000 was sufficient.
Will work on this next week (KW10)
I rather think of a one-sided function that restricts and expands
x
to the size ofref
:
Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should
x
to ref
, but not restrict it to ref
@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q could you point me to gdxes with irregularities that could make reportEmi and reportFE crash if I don't restore zeros?
Nope.
@mellamoSimon worked on this, I only surmised why he did what he did. I have no idea if that is a problem that crops up regularly, or if he encountered it at all, or if he just did as the Romans do.
But in theory you should be able to test this with every instance of readGDX(restore_zeros = TRUE)
:
$ ag 'restore_zeros *= *T'
R/reportFE.R
82: vm_demFENonEnergySector <- readGDX(gdx, "vm_demFENonEnergySector", field = "l", restore_zeros = T, react = "silent")[,t,]*TWa_2_EJ
R/reportSE.R
77: storLoss <- readGDX(gdx, name = c("v32_storloss", "v_storloss"), field = "l", restore_zeros = TRUE, format = "first_found") * pm_conv_TWa_EJ
434: pm_fuExtrOwnCons <- readGDX(gdx, "pm_fuExtrOwnCons", restore_zeros = T)[,,getNames(pm_fuExtrOwnCons_reduced)]
R/reportPrices.R
991: o01_CESderivatives <- readGDX(gdx, "o01_CESderivatives", restore_zeros = T, react = "silent")
R/reportEmi.R
149: # note: this has to be read in with restore_zeros=T because sometimes it contains only non-zero values for "ETS" emiMkt
151: pm_IndstCO2Captured <- readGDX(gdx, "pm_IndstCO2Captured", restore_zeros = T,
196: vm_emiIndCCS <- readGDX(gdx, "vm_emiIndCCS", field = "l", restore_zeros = T)[, t, ]
332: vm_plasticsCarbon <- readGDX(gdx, "vm_plasticsCarbon", field = "l", restore_zeros = T, react = "silent")[,t,]
338: vm_feedstockEmiUnknownFate <- readGDX(gdx, "vm_feedstockEmiUnknownFate", field = "l", restore_zeros = T, react = "silent")[,t,]
339: vm_incinerationEmi <- readGDX(gdx, "vm_incinerationEmi", field = "l", restore_zeros = T, react = "silent")[,t,]
340: vm_nonIncineratedPlastics <- readGDX(gdx, "vm_nonIncineratedPlastics", field = "l", restore_zeros = T, react = "silent")[,t,]
2704: p47_LULUCFEmi_GrassiShift <- readGDX(gdx, "p47_LULUCFEmi_GrassiShift", restore_zeros = T, react = "silent")[getRegions(out), getYears(out),]
hey, yeah, that's a good assumption, I eventually set restore_zeros = T
because the script would fail otherwise. I did that long ago though so I don't recall the exact mismatch of dimensions that would happen. I can try to reproduce it and let you know? @fbenke-pik
Keeping the use case of this helper in remind2 reporting in mind, I'd assume it should
- expand
x
toref
, but not restrict it toref
But the use case in remind2
does always include also a restriction. All the examples above have something like [,t,]
attached to the readGDX()
call (except for storLoss
, which does it in a funky way a couple of lines below).
I agree with @orichters on this one.
Thanks Simón. Right now I favour the following approach:
Let me know in case there are objections. If not, then nothing to do on your end, Simón.
Hi Falk, that sounds good, thank you!
Looks like this PR should solve the problems: https://github.com/pik-piam/remind2/pull/557
Could solve the problems mostly by introducing new options to readGDX (https://github.com/pik-piam/gdx/pull/8).
As of now, I don't see a need for somebody_name_this_function
.
Validation:
/p/tmp/benke/validation
/p/tmp/benke/validation/validate_remind2-29209181.out
Leaving this open, as it will probably become relevant in the future again. (see https://github.com/pik-piam/remind2/issues/524)
The https://github.com/remindmodel/remind/pull/1609 problem comes down to the input.gdx
, which is from the current set of CES parameters sports a whole lot of zeros for SE/FE technologies that do not actually exist.
readGDX(gdx, 'vm_demFENonEnergySector', field = 'l', restore_zeros = FALSE)[,t,] %>%
magclass_to_tibble() %>%
anti_join(readGDX(gdx, 'sefe')) %>%
filter(!is.na(value)) %>%
distinct(all_enty, all_enty1, .keep_all = TRUE)
Joining with `by = join_by(all_enty, all_enty1)`
# A tibble: 19 × 7
all_regi ttot all_enty all_enty1 emi_sectors all_emiMkt value
<chr> <int> <chr> <chr> <chr> <chr> <dbl>
1 EUR 2005 seliqbio fegas indst ETS 0
2 EUR 2005 seliqbio fesos indst ETS 0
3 EUR 2005 seliqsyn fegas indst ETS 0
4 EUR 2005 seliqsyn fesos indst ETS 0
5 EUR 2005 sesobio fegas indst ETS 0
6 EUR 2005 sesobio fehos indst ETS 0
7 EUR 2005 seel fegas indst ETS 0
8 EUR 2005 seel fehos indst ETS 0
9 EUR 2005 seel fesos indst ETS 0
10 EUR 2005 seh2 fegas indst ETS 0
11 EUR 2005 seh2 fehos indst ETS 0
12 EUR 2005 seh2 fesos indst ETS 0
13 EUR 2005 segabio fehos indst ETS 0
14 EUR 2005 segabio fesos indst ETS 0
15 EUR 2005 segasyn fehos indst ETS 0
16 EUR 2005 segasyn fesos indst ETS 0
17 EUR 2005 sehe fegas indst ETS 0
18 EUR 2005 sehe fehos indst ETS 0
19 EUR 2005 sehe fesos indst ETS 0
Probably some REMIND/GAMS carry-over.
I also came across the issue of the non-existing SE/FE technologies today, when trying to do reporting for a Remind Base run, which failed due to the vm_demFeSector/vm_demFENonEnergySector mismatch. Falk's merge fortunately fixed that. I tried to trace down the origin of this error and I would guess that the non-existing SE/FE technologies might come from (in Remind)
q37_FossilFeedstock_Base(t,regi,entyFe,emiMkt)$(
entyFE2sector2emiMkt_NonEn(entyFe,"indst",emiMkt)
AND cm_emiscen eq 1 ) ..
sum(entySe, vm_demFENonEnergySector(t,regi,entySe,entyFe,"indst",emiMkt))
=e=
sum(entySeFos,
vm_demFENonEnergySector(t,regi,entySeFos,entyFe,"indst",emiMkt)
)
;
as there is no sefe mapping active here (and this equation is only active for Base runs). Maybe it's worth looking into this.
Thanks Ricarda. @orichters, was https://github.com/remindmodel/remind/pull/1609 a Base run?
The error that occured? That was in a testOneRegi
.
testOneRegi
runs should also be subject to the equation I posted - as it also has cm_emiscen = 1
(the default).
No, you can run testOneRegi
for any configuration you want. Baseline, NPi, Policy, calibration.
q37_FossilFeedstock_Base
is standing next to the Guillotine anyhow, so …
Well, it was a testOneRegi
as called by make test
, so with default settings.
Closing, as this seems to be stable now, not causing any further issues.
cd /p/projects/remind/modeltests/remind/output; grep oom-kill testOne*/log_output.txt
restore_zeros = FALSE
because we declare all variables over the general supersets (e.g.,tall
,all_enty
), but usually define them only over subsets (t
,entyFE
).restore_zeros = FALSE
reads in only data that is not zero, which is usually what we want. But if some axis of the data (e.g. all values in 2010, or everythingfehos
) happens to be entirely zero (usually that happens for specific gdxes only), that dimension of the resulting magpie object differs (e.g. c(2005, 2015, …) instead of c(2005, 2010, 2015, …)) and it is incompatible with all the rest of the objects we have. What would be needed is a function to expands a variable read in withrestore_zeros = FALSE
to the actual sets it should have. But that is a moving target, as it depends on all the other variables/parameters used in conjunction with it.magpie_expand
is not the solutionSome code to start with:
The question is how to be able to sum, mbind etc.
A
andB
.The problem is that the dimensions each variable is defined (not declared) over in the GAMS code (e.g. t instead of tall, se2fe(entySE,entyFE,te) instead of all all_enty/all_enty/all_te combinations) is not pre-defined, and the problem depends on which other variables the variables is combined with (in magpie operations) and what data those variables have.