Closed dbantje closed 1 year ago
- Setting up
testOneRegi
runs is an OK idea, but won't get you anywhere in this case, sinceREF
is infeasible only from iteration 13 on, andtestOneRegi
will yield different results thannash
in the first twelve iterations. So you are not in fact debugging the same 13th iteration …
Things to try:
SSP2EU-Base
into the path_gdx
column of the NDC run. (This should find a .gdx file automatically. If it doesn't, replace SSP2EU-Base
with the path to a fulldata.gdx
from a finished SSP2EU-Base
run in all path_
columns. No point in running identical scenarios over and over.)
If that doesn't work …debug
run.
Set cm_nash_mode
to debug
, but leave optimization
on nash
. That will cause a run that is almost identical to the failing parallel
one, but with information on where the infeasibility actually is. Subtle differences we don't understand also mean that serial
/debug
runs might "just work" where parallel
ones fail.c_keep_iteration_gdxes
, which might make it easier debugging specific iterations should they fail.Put SSP2EU-Base into the pathgdx column of the NDC run. (This should find a .gdx file automatically. If it doesn't, replace SSP2EU-Base with the path to a fulldata.gdx from a finished SSP2EU-Base run in all path columns. No point in running identical scenarios over and over.)
FYI: Finding an old gdx from a SSPEU-Base run automatically works if you set start
to 0
for SSP2EU-Base
such that this run will not be started.
FYI: Finding an old gdx from a SSPEU-Base run automatically works if you set
start
to0
forSSP2EU-Base
such that this run will not be started.
And it then selects the most recent directory or fulldata.gdx
it can find?
I'm never sure about this newfangled options …
The most recent fulldata.gdx
. But it should tell which one it takes once you run ./start.R config/whatever.csv
. Or check before by running ./start.R --test config/whatever.csv
. And if you put SSP2EU-AMT-Base
into path_gdx
, it should even find the most recent AMT run automatically, but also warn you about that…
Using a different .gdx
file doesn't help, still crashing in iteration 17. Doing a nash debug run now ...
Doing a nash debug run now ...
General note: cluster time is cheaper than model operator (your) time. Especially on the weekend, where operator time has an infinite price. :chart_with_upwards_trend: Therefore, set up stuff like that in parallel. If one fails, you save a day. If both succeed, you have options.
The full.lst
of the nash debug run is here: /p/tmp/davidba/remind/output/SSP2EU-NDC_1_nash_debug_2022-12-06_15.20.45
.
From searching it, I find that most errors come from q39_EqualSecShare_BioSyn
, e.g.
**** ERRORS/WARNINGS IN EQUATION q39_EqualSecShare_BioSyn(2020,REF,fedie,trans,ES)
1 error(s): All Jacobian elements in the row are very small.
for several regions, years, etc.
From searching it, I find that most errors come from
q39_EqualSecShare_BioSyn
, e.g.**** ERRORS/WARNINGS IN EQUATION q39_EqualSecShare_BioSyn(2020,REF,fedie,trans,ES) 1 error(s): All Jacobian elements in the row are very small.
for several regions, years, etc.
Well, those errors/warnings in and of themselves are not a problem. Infeasibilities are further down.
I updated the listinfes
tool to work with nash
.lst files, too. But this one is 2.4 GB, so running it takes about two minutes. I did
$ listinfes full.lst > infes.txt
you can just look at infes.txt
.
So, let's walk through this:
$ nashstat -F
itr region solvestat modelstat resusd objval
7 SSA NORMAL COMPLETION LOCALLY INFEASIBLE 128.502 23.8755340771311 F
13 REF NORMAL COMPLETION LOCALLY INFEASIBLE 2017.205 -42.3708212465084 F
14 REF NORMAL COMPLETION LOCALLY INFEASIBLE 873.393 -42.3714435312192 F
15 REF NORMAL COMPLETION LOCALLY INFEASIBLE 535.461 -42.3718120713121 F
16 REF NORMAL COMPLETION LOCALLY INFEASIBLE 6.617 -42.3717927205915 F
17 REF NORMAL COMPLETION LOCALLY INFEASIBLE 28.429 -42.3717796840736 F
The q39_EqualSecShare_BioSyn
infeasibility in SSA
actually disappears on its own again. Interesting is iteration 13, because that one stays.
$ less -p "iteration = 13" infes.txt
iteration = 13, sol_itr = 1, regi = 'REF', equ = 'qm_budget'
LOWER LEVEL UPPER MARGINAL
2030.REF 0.2103 -0.5591 +INF 1.0000
2035.REF 0.2134 0.0584 +INF 1.0000
iteration = 13, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
LOWER LEVEL UPPER MARGINAL
2030.REF 0.2103 -0.6443 +INF 0.5000
…
That's the stuff that stays around and breaks the run.
qm_budget
is kind of a black box to me. You might ask in the REMIND channel whether somebody has an idea about this. If not, I would suggest this:
Run the debug scenario again, but activate the Equation Listing, to see what actually goes on in qm_budget(2030,REF)
.
To do so, add this
+if (ord(iteration) ge 13 AND sameas(regi,"REF"),
+ option
+ limrow = 2147483647
+ limcol = 2147483647
+ solprint = on
+ ;
+else
+ option
+ limrow = 0
+ limcol = 0
+ solprint = off
+ ;
+);
+
solve hybrid using nlp maximizing vm_welfareGlob;
right before solve hybrid using nlp maximizing vm_welfareGlob;
in ./modules/80_optimization/nash/solve.gms
. That should print more information for iteration 13 and REF
(but cut down on the size of the .lst file otherwise).
Hmmm, this does not compile:
2933567 if (ord(iteration) ge 13 AND sameas(regi,"REF"),
**** $149
**** LINE 46 INCLUDE /p/tmp/davidba/remind/modules/80_optimization/nash/solve.gms
**** LINE 32 INCLUDE /p/tmp/davidba/remind/modules/80_optimization/nash/realization.gms
**** LINE 17 INCLUDE /p/tmp/davidba/remind/modules/80_optimization/module.gms
**** LINE 43 BATINCLUDE /p/tmp/davidba/remind/modules/include.gms
%1 solve
**** LINE 81 INCLUDE /p/tmp/davidba/remind/core/loop.gms
**** LINE 1626 INPUT /p/tmp/davidba/remind/output/gamscompile/main_SSP2EU-NDC_1_eqlisting.gms
**** 149 Uncontrolled set entered as constant
Crap. Should be all_regi
.
There seems to be another compilation problem:
cd /p/projects/remind/modeltests/output/SSP2EU-AMT-NDC_2022-12-10_08.51.08
less -j 4 --pattern='^\*\*\*\*' main.lst
May be caused by this PR by @fschreyer, but I'm not sure…
Certainly was. Remove the AND
and go again.
The compilation error is fixed: https://github.com/remindmodel/remind/pull/1124
Here's the full.lst
with the equation listing: /p/tmp/davidba/remind/output/SSP2EU-NDC_1_eqlisting_2022-12-12_10.10.57/full.lst
.
But I don't really know enough about the model to make anything from it ... I'll ask in the REMIND channel.
And for my understanding:
listinfes
a tool in the REMIND repo?qm_budget(2030,REF)
infeasibility is the root?
- Is
listinfes
a tool in the REMIND repo?
It's a shell script on the cluster doing the filtering of the .lst
files for you.
$ which listinfes
/p/projects/rd3mod/tools/listinfes
- How do you know that the
qm_budget(2030,REF)
infeasibility is the root?
$ grep "sol_itr = 2, regi = 'REF'" infes.txt
iteration = 13, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 14, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 15, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 16, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 17, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
The "original" debug run has infeasibilities only in qm_budget
in the second solver try. Everything else disappears on its own.
Not sure where we will get with qm_budget
, and not sure how much I get to work next week (A spectre is haunting Europe — the spectre of Kindergarten cold), here's another approach:
I checked out all merge requests between #1045 (last working SSP2EU-AMT-NDC
) and #1066 (first non-working SSP2EU-AMT-NDC
) and started SSP2EU-AMT-Base
and SSP2EU-AMT-NDC
runs for them.
/p/tmp/pehl/bughunt/NDC_infes/Remind_01_a919793
/p/tmp/pehl/bughunt/NDC_infes/Remind_02_4f9dfc8
/p/tmp/pehl/bughunt/NDC_infes/Remind_03_81084f4
/p/tmp/pehl/bughunt/NDC_infes/Remind_04_823d8c8
/p/tmp/pehl/bughunt/NDC_infes/Remind_05_55025e4
/p/tmp/pehl/bughunt/NDC_infes/Remind_06_678ecae
/p/tmp/pehl/bughunt/NDC_infes/Remind_07_64f0d71
/p/tmp/pehl/bughunt/NDC_infes/Remind_08_b4dbf2e
/p/tmp/pehl/bughunt/NDC_infes/Remind_09_0b69047
/p/tmp/pehl/bughunt/NDC_infes/Remind_10_8e182a5
/p/tmp/pehl/bughunt/NDC_infes/Remind_11_09bb41e
/p/tmp/pehl/bughunt/NDC_infes/Remind_12_aa1d6a8
/p/tmp/pehl/bughunt/NDC_infes/Remind_13_52b68c9
/p/tmp/pehl/bughunt/NDC_infes/Remind_14_91bd70f
/p/tmp/pehl/bughunt/NDC_infes/Remind_15_c45fb19
/p/tmp/pehl/bughunt/NDC_infes/Remind_16_7583cc0
/p/tmp/pehl/bughunt/NDC_infes/Remind_17_32c8444
/p/tmp/pehl/bughunt/NDC_infes/Remind_18_407ab64
Those will take a while, but in the end the NDC run from Remind_01_a919793
(and possibly others) should succeed, and that of Remind_18_407ab64
(and possibly others) should fail, and we should get one merge request that introduced changes that made NDC fail. That should narrow down what to look at.
Thanks, Michaja, for your continuous efforts. I set up the NGFS runs which include a NDC run, which went through without problems /p/tmp/oliverr/NGFS_v3_tests/2022-12-15/remind/output
, although with 10.7 hours and 38 iterations , it was slower than before (5–7 hours, 30–33 iterations). On the other hand, the h_cpol
with Current Policies failed twice with two different gdx input files, but in different regions. :/
Curious.
Modeltest NDC
failed again, but differently.
/p/projects/remind/modeltests/output/SSP2EU-AMT-NDC_2022-12-17_05.27.10 (develop|?? M)$ nashstat -F
itr region solvestat modelstat resusd objval
2 REF NORMAL COMPLETION LOCALLY INFEASIBLE 375.755 8.24329876304596 F
3 MEA NORMAL COMPLETION LOCALLY INFEASIBLE 858.3 29.7301971790709 F
15 MEA NORMAL COMPLETION LOCALLY INFEASIBLE 261.304 29.5850612625419 F
15 SSA NORMAL COMPLETION LOCALLY INFEASIBLE 689.894 17.0756978845647 F
16 MEA NORMAL COMPLETION LOCALLY INFEASIBLE 119.993 29.5527431092611 F
16 SSA NORMAL COMPLETION LOCALLY INFEASIBLE 5.481 -329.746903738767 F
17 SSA NORMAL COMPLETION LOCALLY INFEASIBLE 54.814 -331.892744569579 F
18 SSA TERMINATED BY SOLVER INTERMEDIATE INFEASIBLE 163.785 -333.428109270105 F
19 CAZ TERMINATED BY SOLVER INTERMEDIATE INFEASIBLE 845.425 -8.08066610393547 F
19 CHA NORMAL COMPLETION LOCALLY INFEASIBLE 1073.26 -254.204360029201 F
19 EUR TERMINATED BY SOLVER INTERMEDIATE INFEASIBLE 213.742 -72.9722534374444 F
19 IND NORMAL COMPLETION LOCALLY INFEASIBLE 376.77 -302.463432040631 F
19 JPN NORMAL COMPLETION LOCALLY INFEASIBLE 552.323 -10.3379325073368 F
19 LAM NORMAL COMPLETION LOCALLY INFEASIBLE 470.555 -116.139039138215 F
19 MEA NORMAL COMPLETION LOCALLY INFEASIBLE 71.247 -104.497618743203 F
19 NEU NORMAL COMPLETION LOCALLY INFEASIBLE 990.524 -11.6234521188096 F
19 OAS TERMINATED BY SOLVER INTERMEDIATE INFEASIBLE 404.45 -265.654693938257 F
19 REF TERMINATED BY SOLVER INTERMEDIATE INFEASIBLE 325.48 -42.9602459925745 F
19 SSA NORMAL COMPLETION LOCALLY INFEASIBLE 8.744 -333.426828849519 F
19 USA NORMAL COMPLETION LOCALLY INFEASIBLE 646.963 -51.0161196546982 F
SSA
from iteration 15 on …
thanks Michaja for your test runs for the different REMIND versions. Surprisingly, also for Remind_18_407ab64
the NDC scenario converged. Did you do any changes in the settings for those test runs?
407ab64
failed in Base on some R error, not in NDC.
I set up a further batch of tests.
/p/tmp/pehl/bughunt/NDC_infes/Remind_101_6c10530
/p/tmp/pehl/bughunt/NDC_infes/Remind_102_368849e
/p/tmp/pehl/bughunt/NDC_infes/Remind_103_9fe2def
/p/tmp/pehl/bughunt/NDC_infes/Remind_104_ba96f73
/p/tmp/pehl/bughunt/NDC_infes/Remind_105_92b255a
/p/tmp/pehl/bughunt/NDC_infes/Remind_106_1380bc1
/p/tmp/pehl/bughunt/NDC_infes/Remind_107_9cfc2c2
/p/tmp/pehl/bughunt/NDC_infes/Remind_108_ec04597
/p/tmp/pehl/bughunt/NDC_infes/Remind_109_14d7b12
/p/tmp/pehl/bughunt/NDC_infes/Remind_110_4191ff5
/p/tmp/pehl/bughunt/NDC_infes/Remind_111_2c6f0d4
/p/tmp/pehl/bughunt/NDC_infes/Remind_112_00da511
/p/tmp/pehl/bughunt/NDC_infes/Remind_113_888bbd8
/p/tmp/pehl/bughunt/NDC_infes/Remind_114_83170fc
9cfc2c2 (#1084), 4191ff5 (#1000), 2c6f0d4 (#1046), and 00da511 (#1093) have compilation errors. If it is not clearly another merge request introducing the infeasibility, it's either fixing those or this is a dead end.
https://github.com/remindmodel/remind/commit/9cfc2c2d7b09d3bf2b20db74fdfd1edf00082874 (https://github.com/remindmodel/remind/pull/1084) introduced a bug (compilation error) which was fixed in #1087
The errors in #1046 and #1093 were both fixed/reset in #1094. I don't think these compilation error runs help to solve the NDC challenge, these were simply bugs that were fixed later.
And, as I said earlier, the NDC run (h_ndc_bIT
) of scenario_config_NGFS_v3.csv
works (using commit 49ad60690f9f8a30e20c81525c0808aa4e957a13
, last merge is #1130). So I don't think that the problem is easy to find :(
So I don't think that the problem is easy to find :(
You are welcome to come up with a more sophisticated plan.
And, as I said earlier, the NDC run (
h_ndc_bIT
) ofscenario_config_NGFS_v3.csv
works (using commit49ad60690f9f8a30e20c81525c0808aa4e957a13
, last merge is #1130). So I don't think that the problem is easy to find :(
Just to add to the evidence here, I had SSP2EU-NDC failing too, last merge #1128 . Tried with several GDXs and same problem, irredeemable INFES in MEA from the first iteration. But using Oliver's NGFS run as a starting point actually worked:
/p/projects/piam/abrahao/scratch/lowen_calibrate/output/SSP2EU-NDC_2022-12-22_18.48.07
However, starting from the same point but with a different calibration (some weird project stuff) made the first iteration OK but immediately reverted to the same problem on iteration 2. Maybe there's something making CONOPT get trapped? Are there any CONOPT parameters we can tweak to make it widen the search or something? Still investigating what those INFES actually are.
/p/projects/piam/abrahao/scratch/lowen_calibrate/output/SSP2EU_lowEn-NDC_2022-12-22_18.47.11
Whatever happened in PR
that all seem completely unrelated, the most recent AMT NDC run was successful.
I debug this a bit using last week's run (SSP2EU-AMT-NDC_2022-12-17_05.27.10).
This is what I could find:
Equation q32_flexAdj
is using the marginal prices (pm_SEprice
for seel
) from the previous iteration on its calculations.
(1) If pm_SEprice
is negative for a given region - this could happen for example in initial or terminal years due to quantity bounds -, in the next iteration: (2) vm_flexAdj
will act in the opposite direction as it should (as far as I understood), (3) v21_taxrevFlex
will blow up in value due to this, (4) vm_taxrev
will blow up in value as consequence, (5) the budget equation will have a very hard time to compensate this tax unbalance, (6) the model will show infeasibilities as consequence.
I disabled the flex tax (cm_flex_tax = 0) and restarted the infeasible run:
/p/projects/remind/users/renatoro/Debug_trunk/2022_12_27/output/SSP2EU-AMT-NDC_2022-12-17_05.27.10_rerun_noFlexTax
The new run converged without issues.
(the same run with cm_flex_tax = 1 for reference: /p/projects/remind/users/renatoro/Debug_trunk/2022_12_27/output/SSP2EU-AMT-NDC_2022-12-17_05.27.10_rerun
)
First of all, I am also not familiar with the flex tax formulation, so maybe @robertpietzcker @fschreyer should check if I am not saying something completely off here. I cannot say also if this is the only problem without further tests and runs.
Anyway, the use of marginals in a model equation can potentially explain why the model is unstable, sometimes working and sometimes not. Marginals are solver dependable, and small changes in the initial point or in other variables can avoid corner solutions or over-bounded situations that could cause negative marginals. Even memory garbage and numeric approximations can influence on that. I would try to avoid as much as it is possible to use directly marginals from previous iterations on model equations, unless you control directly to avoid these edge cases.
A real solution for this would be to reformulate the FlexTax code to not rely in marginals. If it is really necessary to use marginal values from previous iterations, I would rely instead on pm_FEPrice_by_SE_Sector
to provide the prices, as this should be more stable than pm_SEprice
values, and either force a minimum price value to avoid the negative price problem or force vm_flexAdj
to be a positive variable if this is really the intention behind the formulation (I am not 100% sure about that as I do not control the flex tax code).
on a related point, @robertpietzcker @fschreyer is it really necessary to have the flex tax dynamics entirely endogenous to the model?
From the equations I could look at, the vm_flexAdj
dynamics is very non-linear. As it is using the tax formulation framework, maybe this could be potentially defined in between iterations reducing the complexity, and potentially the solution time, of the model.
Thanks, @Renato-Rodrigues, I'm very grateful for you digging into this problem.
Hey all, thanks for raising these points!
I myself wasn't involved in the flextax stuff, but as Felix is on parental leave and I did most of the other power sector integration stuff, I guess it falls in my responsibility :-)
From a first thinking-through of Felix equations, I was a bit surprised about their complexity, but then I realized that as long as cm_FlexTaxFeedback
is off (which it luckily is in default setting),
vm_capFac
gets fixed again in IntC/bounds.gms, so that q32_flexPriceShareMin
collapses so that v32_flexPriceShareMin
is fixed and not a variable anymore, but rather has the value of ~0.5. This means v32_flexPriceShare
is simply 1- (0.5 times the total VRE share) - so 1 at low VRE shares, 0.5 at high VRE sharesq32_flexPriceBalance
is turned off.so what remains is
vm_flexAdj
is equal to (1-v32_flexPriceShare
) times the electricity price from last iteration, so 0 at low VRE shares, and 0.5 at high VRE shares; andv21_taxrevFlex
equal to -vm_flexAdj
times vm_demSE
minus the value from the last iteration.So principally there should not be any fundamental problem with the electricity price becoming negative - this should only make vm_flexAdj move from positive to negative, thus instead of giving an incentive to use the technology, it should give a disincentive to the technology - not really problematic. (Not perfect for convergence, likely, but a negative electricity price will anyway create some weird incentives).
Still, it would likely be better to remove any effect at prices below 0.
I guess one could simply change
q32_flexAdj
to use =g=
instead of =e=
and make v32_flexAdj
be positive,
but maybe that would lead to some unnecessary freedom for the model.
So maybe a cleaner way would be to create a parameter
pm_SEPrice_noNegatives
that contains pm_SEPrice
but with all values <0 set to 0, and use this in the equation.
What do you think, @Renato-Rodrigues?
oh, and if anyone of you has a run where it seems that q32_flexAdj really is the culprit for creating an infeasible solution, please send it over so I can have a look. The one that Renato checked (that was infeasible with flexTax on and feasible when it was turned off) also became feasible with flextax on once it was run in debug mode (as REMIND does so often...) /p/projects/remind/users/renatoro/Debug_trunk/2022_12_27/output/SSA_SSP2EU-AMT-NDC_2022-12-27_16.36.56
From the equations I could look at, the
vm_flexAdj
dynamics is very non-linear. As it is using the tax formulation framework, maybe this could be potentially defined in between iterations reducing the complexity, and potentially the solution time, of the model.
I agree, Renato.
At least for the parts that collapse when cm_FlexTaxFeedback
is off (concretely q32_flexPriceShareMin
) it might make sense to not only let them collapse in runtime, but simply turn them off and instead fix the variable v32_flexPriceShareMin
in bounds.gms or so when cm_FlexTaxFeedback
is off.
I second for limiting pm_SEPrice
to only non-negative values in the equation as the simpler work around for the issue.
My suggestion at the IEA-Update channel was in this line, but instead of creating an additional parameter, I suggested to add a dollar condition to the right hand side of the equation so it would be zero when prices are negative. Both should theoretically work in the same way.
The one that Renato checked (that was infeasible with flexTax on and feasible when it was turned off) also became feasible with flextax on once it was run in debug mode (as REMIND does so often...)
Did you tried to set a normal scenario run and save the iteration gdxs instead? If you don't set it to run in debug mode, you minimize the disruption to the way the solver searches for a solution. The infeasible iteration gdxs should contain all the info necessary to debug the run.
not really problematic. (Not perfect for convergence, likely, but a negative electricity price will anyway create some weird incentives)
I only disagree with this part of your analysis. If you have any minimal bound applied to the affected capacities, no matter how small it is, the marginal equation values ("fake negative prices") could be high enough to cause an infeasibility in the budget equation balance due to resulting extreme cost levels applied to the technology.
The problem is that the solver marginals can be huge due to computational solver artifacts instead of economic reasons. This would happen specially in situations that the model is overbounded for example. If you have an overbounded situation in the model, the solver will set an extreme value for the marginal of the affected equation as it would do do anything possible to get ride of any quantity from the affected equation. This cause crazy marginal values that have no economical reason to be, besides a computational artifact caused by the solver way of searching for the problem solution. These marginals should never be considered to have economic foundation or as a reasonable model outcome to be used in later iterations. The size of these marginals is quite volatile also. Even memory garbage in floating point assignment to store variables could cause near-zero approximations that would cause the solver to overbound or cull equations in some cases, and do nothing in other cases. This is a know issue that affects most of integer and non-linear computational solvers. It is also one of the main reasons why using .up and .lo bounds are recommend whenever possible instead of defining literal equations, as variable bounds are considered strictly (with no tolerance), meanwhile equation defined bounds consider tolerances and can be culled from the solver due to numerical approximations.
In summary, you cannot say a priori if extreme marginal values provided by the solver are economic based or a computationally assigned solver value without analyzing if their ranges make sense in the first place. Specially if they happen close to initial and terminal years in the model, as they tend to be more overbounded. Any automated algorithm that does not take that into consideration is subject to unstable and unreliable results. As the proposed solution here disconsider entirely the negative marginal values, I would say that most probably we could get away with using the marginal values directly in this equation. Nevertheless, I would be extra careful on using solver marginals directly in the model equations in any automated way if you don't give enough attention to the issues I explained above.
Currently, the SSP2EU-NDC run crash with an execution error. See for example the automated test results:
or the these two runs I started (also during iteration 17):
Here's the abort from the second run:
In that run, first infeasibilities already show up in iteration 2:
and then again in iteration 13: