SSP2EU-NDC runs crash - Githubissues

dbantje commented 1 year ago

Currently, the SSP2EU-NDC run crash with an execution error. See for example the automated test results:

Run                                 Runtime      inSlurm  RunType      RunStatus           Iter              Conv                   modelstat            Mif       inAppResults
default-AMT-_2022-11-26_00.05.49    6.3 hours    FALSE    nash         Normal completion   40/100            converged              2: Locally Optimal   TRUE      TRUE          
SDP-AMT-Base_2022-11-26_00.09.30    NA           FALSE    nash         Execution error     1/100             NA                     NA                   FALSE     NA            
SSP1-AMT-Base_2022-11-26_00.08.32   6.2 hours    FALSE    nash         Normal completion   39/100            converged              2: Locally Optimal   TRUE      TRUE          
SSP2EU-AMT-Base_2022-11-26_00.08.5  7.9 hours    FALSE    nash         Normal completion   54/100            converged              2: Locally Optimal   TRUE      TRUE          
SSP2EU-AMT-calibrate_2022-11-26_00  2 days       FALSE    Calib_nash   Normal completion   38/100 Clb: 10    converged              2: Locally Optimal   TRUE      TRUE          
SSP2EU-AMT-NDC_2022-11-26_08.15.06  NA           FALSE    nash         Execution error     17/100            722222527222           6: Intermed Infes    FALSE     NA            
SSP5-AMT-Base_2022-11-26_00.09.11   NA           FALSE    nash         Execution error     1/100             NA                     NA                   FALSE     NA            
testOneRegi-AMT-Base_2022-11-26_00  44.3 mins    FALSE    testOneRegi  Normal completion   1/1               NA                     2: Locally Optimal   TRUE      TRUE

or the these two runs I started (also during iteration 17):

/p/tmp/davidba/remind/output/SSP2EU-NDC_off_2022-11-24_19.04.11
/p/tmp/davidba/remind/output/SSP2EU-NDC_on_2022-11-24_17.34.03

Here's the abort from the second run:

----3518138 Run was aborted because the maximum number of consecutive failures was reached in at least one region!
**** Exec Error at line 3518138: Execution halted: abort$5 'Run was aborted because the maximum number of consecutive failures was reached in at least one region!'

In that run, first infeasibilities already show up in iteration 2:

  2   solvestat              modelstat                 resusd     objval
CAZ      normal completion           locally optimal    221.577     9.08869601426056
CHA   terminated by solver         feasible solution   1316.109     51.5540838280981
EUR      normal completion           locally optimal    1040.13     60.1587487884284
IND      normal completion           locally optimal    708.188    0.417823683904527
JPN      normal completion           locally optimal      2.628     14.1853542844219
LAM      normal completion           locally optimal     42.112     36.0860451172938
MEA      NORMAL COMPLETION        LOCALLY INFEASIBLE   1469.185     30.3120428441084   F
NEU      normal completion           locally optimal     43.688     11.4700754206826
OAS      normal completion           locally optimal     627.66     37.5607604726356
REF   TERMINATED BY SOLVER   INTERMEDIATE INFEASIBLE    372.451     11.4539205649284   F
SSA   terminated by solver         feasible solution    207.206     24.0837757596486
USA      normal completion           locally optimal      711.2     47.4312437990165

and then again in iteration 13:

 13   solvestat              modelstat                 resusd     objval
CAZ      normal completion           locally optimal      1.383      8.6394651067668
CHA      normal completion           locally optimal      2.288     45.1684505887798
EUR      normal completion           locally optimal    260.686     59.8735383791675
IND      normal completion           locally optimal      194.1    -24.1267322694784
JPN      normal completion           locally optimal      4.236     14.0584582408219
LAM      normal completion           locally optimal    163.628     33.7003066745678
MEA      normal completion           locally optimal      1.457     21.5224930795179
NEU      normal completion           locally optimal     27.234     11.2806295768437
OAS      normal completion           locally optimal    112.604     26.5767834997304
REF      NORMAL COMPLETION        LOCALLY INFEASIBLE     98.182    -42.3685062477493   F
SSA      normal completion           locally optimal     89.992     25.0999527920814
USA      normal completion           locally optimal       2.13     47.1284138316188

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Setting up testOneRegi runs is an OK idea, but won't get you anywhere in this case, since REF is infeasible only from iteration 13 on, and testOneRegi will yield different results than nash in the first twelve iterations. So you are not in fact debugging the same 13th iteration …

Things to try:

Use a different .gdx file. Put SSP2EU-Base into the path_gdx column of the NDC run. (This should find a .gdx file automatically. If it doesn't, replace SSP2EU-Base with the path to a fulldata.gdx from a finished SSP2EU-Base run in all path_ columns. No point in running identical scenarios over and over.) If that doesn't work …
Set up a debug run. Set cm_nash_mode to debug, but leave optimization on nash. That will cause a run that is almost identical to the failing parallel one, but with information on where the infeasibility actually is. Subtle differences we don't understand also mean that serial/debug runs might "just work" where parallel ones fail.
For both options above, enable c_keep_iteration_gdxes, which might make it easier debugging specific iterations should they fail.

orichters commented 1 year ago

Put SSP2EU-Base into the pathgdx column of the NDC run. (This should find a .gdx file automatically. If it doesn't, replace SSP2EU-Base with the path to a fulldata.gdx from a finished SSP2EU-Base run in all path columns. No point in running identical scenarios over and over.)

FYI: Finding an old gdx from a SSPEU-Base run automatically works if you set start to 0 for SSP2EU-Base such that this run will not be started.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

FYI: Finding an old gdx from a SSPEU-Base run automatically works if you set start to 0 for SSP2EU-Base such that this run will not be started.

And it then selects the most recent directory or fulldata.gdx it can find? I'm never sure about this newfangled options …

orichters commented 1 year ago

The most recent fulldata.gdx. But it should tell which one it takes once you run ./start.R config/whatever.csv. Or check before by running ./start.R --test config/whatever.csv. And if you put SSP2EU-AMT-Base into path_gdx, it should even find the most recent AMT run automatically, but also warn you about that…

dbantje commented 1 year ago

Using a different .gdx file doesn't help, still crashing in iteration 17. Doing a nash debug run now ...

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Doing a nash debug run now ...

General note: cluster time is cheaper than model operator (your) time. Especially on the weekend, where operator time has an infinite price. :chart_with_upwards_trend: Therefore, set up stuff like that in parallel. If one fails, you save a day. If both succeed, you have options.

dbantje commented 1 year ago

The full.lst of the nash debug run is here: /p/tmp/davidba/remind/output/SSP2EU-NDC_1_nash_debug_2022-12-06_15.20.45.

From searching it, I find that most errors come from q39_EqualSecShare_BioSyn, e.g.

**** ERRORS/WARNINGS IN EQUATION q39_EqualSecShare_BioSyn(2020,REF,fedie,trans,ES)
     1 error(s): All Jacobian elements in the row are very small.

for several regions, years, etc.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

From searching it, I find that most errors come from q39_EqualSecShare_BioSyn, e.g.
**** ERRORS/WARNINGS IN EQUATION q39_EqualSecShare_BioSyn(2020,REF,fedie,trans,ES)
     1 error(s): All Jacobian elements in the row are very small.
for several regions, years, etc.

Well, those errors/warnings in and of themselves are not a problem. Infeasibilities are further down. I updated the listinfes tool to work with nash .lst files, too. But this one is 2.4 GB, so running it takes about two minutes. I did

$ listinfes full.lst > infes.txt

you can just look at infes.txt.

So, let's walk through this:

$ nashstat -F
itr   region   solvestat              modelstat            resusd     objval            
7   SSA      NORMAL COMPLETION   LOCALLY INFEASIBLE    128.502     23.8755340771311   F
13   REF      NORMAL COMPLETION   LOCALLY INFEASIBLE   2017.205    -42.3708212465084   F
14   REF      NORMAL COMPLETION   LOCALLY INFEASIBLE    873.393    -42.3714435312192   F
15   REF      NORMAL COMPLETION   LOCALLY INFEASIBLE    535.461    -42.3718120713121   F
16   REF      NORMAL COMPLETION   LOCALLY INFEASIBLE      6.617    -42.3717927205915   F
17   REF      NORMAL COMPLETION   LOCALLY INFEASIBLE     28.429    -42.3717796840736   F

The q39_EqualSecShare_BioSyn infeasibility in SSA actually disappears on its own again. Interesting is iteration 13, because that one stays.

$ less -p "iteration = 13" infes.txt
iteration = 13, sol_itr = 1, regi = 'REF', equ = 'qm_budget'
              LOWER          LEVEL          UPPER         MARGINAL
2030.REF         0.2103        -0.5591        +INF            1.0000
2035.REF         0.2134         0.0584        +INF            1.0000

iteration = 13, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
              LOWER          LEVEL          UPPER         MARGINAL
2030.REF         0.2103        -0.6443        +INF            0.5000

…

That's the stuff that stays around and breaks the run.

qm_budget is kind of a black box to me. You might ask in the REMIND channel whether somebody has an idea about this. If not, I would suggest this:

Run the debug scenario again, but activate the Equation Listing, to see what actually goes on in qm_budget(2030,REF). To do so, add this

+if (ord(iteration) ge 13 AND sameas(regi,"REF"),
+  option
+    limrow = 2147483647
+    limcol = 2147483647
+    solprint = on
+  ;
+else
+  option
+    limrow = 0
+    limcol = 0
+    solprint = off
+  ;
+);
+
 solve hybrid using nlp maximizing vm_welfareGlob;

right before solve hybrid using nlp maximizing vm_welfareGlob; in ./modules/80_optimization/nash/solve.gms. That should print more information for iteration 13 and REF (but cut down on the size of the .lst file otherwise).

dbantje commented 1 year ago

Hmmm, this does not compile:

2933567  if (ord(iteration) ge 13 AND sameas(regi,"REF"),
****                                             $149
**** LINE     46 INCLUDE     /p/tmp/davidba/remind/modules/80_optimization/nash/solve.gms
**** LINE     32 INCLUDE     /p/tmp/davidba/remind/modules/80_optimization/nash/realization.gms
**** LINE     17 INCLUDE     /p/tmp/davidba/remind/modules/80_optimization/module.gms
**** LINE     43 BATINCLUDE  /p/tmp/davidba/remind/modules/include.gms
                             %1  solve
**** LINE     81 INCLUDE     /p/tmp/davidba/remind/core/loop.gms
**** LINE   1626 INPUT       /p/tmp/davidba/remind/output/gamscompile/main_SSP2EU-NDC_1_eqlisting.gms
**** 149  Uncontrolled set entered as constant

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Crap. Should be all_regi.

orichters commented 1 year ago

There seems to be another compilation problem:

cd /p/projects/remind/modeltests/output/SSP2EU-AMT-NDC_2022-12-10_08.51.08
less -j 4 --pattern='^\*\*\*\*' main.lst

May be caused by this PR by @fschreyer, but I'm not sure…

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Certainly was. Remove the AND and go again.

orichters commented 1 year ago

The compilation error is fixed: https://github.com/remindmodel/remind/pull/1124

dbantje commented 1 year ago

Here's the full.lst with the equation listing: /p/tmp/davidba/remind/output/SSP2EU-NDC_1_eqlisting_2022-12-12_10.10.57/full.lst. But I don't really know enough about the model to make anything from it ... I'll ask in the REMIND channel.

dbantje commented 1 year ago

And for my understanding:

Is listinfes a tool in the REMIND repo?
How do you know that the qm_budget(2030,REF) infeasibility is the root?

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Is listinfes a tool in the REMIND repo?

It's a shell script on the cluster doing the filtering of the .lst files for you.

$ which listinfes 
/p/projects/rd3mod/tools/listinfes

How do you know that the qm_budget(2030,REF) infeasibility is the root?

$ grep "sol_itr = 2, regi = 'REF'" infes.txt 
iteration = 13, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 14, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 15, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 16, sol_itr = 2, regi = 'REF', equ = 'qm_budget'
iteration = 17, sol_itr = 2, regi = 'REF', equ = 'qm_budget'

The "original" debug run has infeasibilities only in qm_budget in the second solver try. Everything else disappears on its own.

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Not sure where we will get with qm_budget, and not sure how much I get to work next week (A spectre is haunting Europe — the spectre of Kindergarten cold), here's another approach: I checked out all merge requests between #1045 (last working SSP2EU-AMT-NDC) and #1066 (first non-working SSP2EU-AMT-NDC) and started SSP2EU-AMT-Base and SSP2EU-AMT-NDC runs for them.

/p/tmp/pehl/bughunt/NDC_infes/Remind_01_a919793
/p/tmp/pehl/bughunt/NDC_infes/Remind_02_4f9dfc8
/p/tmp/pehl/bughunt/NDC_infes/Remind_03_81084f4
/p/tmp/pehl/bughunt/NDC_infes/Remind_04_823d8c8
/p/tmp/pehl/bughunt/NDC_infes/Remind_05_55025e4
/p/tmp/pehl/bughunt/NDC_infes/Remind_06_678ecae
/p/tmp/pehl/bughunt/NDC_infes/Remind_07_64f0d71
/p/tmp/pehl/bughunt/NDC_infes/Remind_08_b4dbf2e
/p/tmp/pehl/bughunt/NDC_infes/Remind_09_0b69047
/p/tmp/pehl/bughunt/NDC_infes/Remind_10_8e182a5
/p/tmp/pehl/bughunt/NDC_infes/Remind_11_09bb41e
/p/tmp/pehl/bughunt/NDC_infes/Remind_12_aa1d6a8
/p/tmp/pehl/bughunt/NDC_infes/Remind_13_52b68c9
/p/tmp/pehl/bughunt/NDC_infes/Remind_14_91bd70f
/p/tmp/pehl/bughunt/NDC_infes/Remind_15_c45fb19
/p/tmp/pehl/bughunt/NDC_infes/Remind_16_7583cc0
/p/tmp/pehl/bughunt/NDC_infes/Remind_17_32c8444
/p/tmp/pehl/bughunt/NDC_infes/Remind_18_407ab64

Those will take a while, but in the end the NDC run from Remind_01_a919793 (and possibly others) should succeed, and that of Remind_18_407ab64 (and possibly others) should fail, and we should get one merge request that introduced changes that made NDC fail. That should narrow down what to look at.

orichters commented 1 year ago

Thanks, Michaja, for your continuous efforts. I set up the NGFS runs which include a NDC run, which went through without problems /p/tmp/oliverr/NGFS_v3_tests/2022-12-15/remind/output, although with 10.7 hours and 38 iterations , it was slower than before (5–7 hours, 30–33 iterations). On the other hand, the h_cpol with Current Policies failed twice with two different gdx input files, but in different regions. :/

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Curious.

Modeltest NDC failed again, but differently.

/p/projects/remind/modeltests/output/SSP2EU-AMT-NDC_2022-12-17_05.27.10 (develop|?? M)$ nashstat -F
itr   region   solvestat              modelstat                 resusd     objval           
  2   REF      NORMAL COMPLETION        LOCALLY INFEASIBLE    375.755    8.24329876304596   F
  3   MEA      NORMAL COMPLETION        LOCALLY INFEASIBLE      858.3    29.7301971790709   F
 15   MEA      NORMAL COMPLETION        LOCALLY INFEASIBLE    261.304    29.5850612625419   F
 15   SSA      NORMAL COMPLETION        LOCALLY INFEASIBLE    689.894    17.0756978845647   F
 16   MEA      NORMAL COMPLETION        LOCALLY INFEASIBLE    119.993    29.5527431092611   F
 16   SSA      NORMAL COMPLETION        LOCALLY INFEASIBLE      5.481   -329.746903738767   F
 17   SSA      NORMAL COMPLETION        LOCALLY INFEASIBLE     54.814   -331.892744569579   F
 18   SSA   TERMINATED BY SOLVER   INTERMEDIATE INFEASIBLE    163.785   -333.428109270105   F
 19   CAZ   TERMINATED BY SOLVER   INTERMEDIATE INFEASIBLE    845.425   -8.08066610393547   F
 19   CHA      NORMAL COMPLETION        LOCALLY INFEASIBLE    1073.26   -254.204360029201   F
 19   EUR   TERMINATED BY SOLVER   INTERMEDIATE INFEASIBLE    213.742   -72.9722534374444   F
 19   IND      NORMAL COMPLETION        LOCALLY INFEASIBLE     376.77   -302.463432040631   F
 19   JPN      NORMAL COMPLETION        LOCALLY INFEASIBLE    552.323   -10.3379325073368   F
 19   LAM      NORMAL COMPLETION        LOCALLY INFEASIBLE    470.555   -116.139039138215   F
 19   MEA      NORMAL COMPLETION        LOCALLY INFEASIBLE     71.247   -104.497618743203   F
 19   NEU      NORMAL COMPLETION        LOCALLY INFEASIBLE    990.524   -11.6234521188096   F
 19   OAS   TERMINATED BY SOLVER   INTERMEDIATE INFEASIBLE     404.45   -265.654693938257   F
 19   REF   TERMINATED BY SOLVER   INTERMEDIATE INFEASIBLE     325.48   -42.9602459925745   F
 19   SSA      NORMAL COMPLETION        LOCALLY INFEASIBLE      8.744   -333.426828849519   F
 19   USA      NORMAL COMPLETION        LOCALLY INFEASIBLE    646.963   -51.0161196546982   F

SSA from iteration 15 on …

LaviniaBaumstark commented 1 year ago

thanks Michaja for your test runs for the different REMIND versions. Surprisingly, also for Remind_18_407ab64 the NDC scenario converged. Did you do any changes in the settings for those test runs?

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

407ab64 failed in Base on some R error, not in NDC. I set up a further batch of tests.

/p/tmp/pehl/bughunt/NDC_infes/Remind_101_6c10530
/p/tmp/pehl/bughunt/NDC_infes/Remind_102_368849e
/p/tmp/pehl/bughunt/NDC_infes/Remind_103_9fe2def
/p/tmp/pehl/bughunt/NDC_infes/Remind_104_ba96f73
/p/tmp/pehl/bughunt/NDC_infes/Remind_105_92b255a
/p/tmp/pehl/bughunt/NDC_infes/Remind_106_1380bc1
/p/tmp/pehl/bughunt/NDC_infes/Remind_107_9cfc2c2
/p/tmp/pehl/bughunt/NDC_infes/Remind_108_ec04597
/p/tmp/pehl/bughunt/NDC_infes/Remind_109_14d7b12
/p/tmp/pehl/bughunt/NDC_infes/Remind_110_4191ff5
/p/tmp/pehl/bughunt/NDC_infes/Remind_111_2c6f0d4
/p/tmp/pehl/bughunt/NDC_infes/Remind_112_00da511
/p/tmp/pehl/bughunt/NDC_infes/Remind_113_888bbd8
/p/tmp/pehl/bughunt/NDC_infes/Remind_114_83170fc

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

9cfc2c2 (#1084), 4191ff5 (#1000), 2c6f0d4 (#1046), and 00da511 (#1093) have compilation errors. If it is not clearly another merge request introducing the infeasibility, it's either fixing those or this is a dead end.

LaviniaBaumstark commented 1 year ago

https://github.com/remindmodel/remind/commit/9cfc2c2d7b09d3bf2b20db74fdfd1edf00082874 (https://github.com/remindmodel/remind/pull/1084) introduced a bug (compilation error) which was fixed in #1087

orichters commented 1 year ago

1000 was reverted: https://github.com/remindmodel/remind/pull/1094

orichters commented 1 year ago

The errors in #1046 and #1093 were both fixed/reset in #1094. I don't think these compilation error runs help to solve the NDC challenge, these were simply bugs that were fixed later.

orichters commented 1 year ago

And, as I said earlier, the NDC run (h_ndc_bIT) of scenario_config_NGFS_v3.csv works (using commit 49ad60690f9f8a30e20c81525c0808aa4e957a13, last merge is #1130). So I don't think that the problem is easy to find :(

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

So I don't think that the problem is easy to find :(

You are welcome to come up with a more sophisticated plan.

gabriel-abrahao commented 1 year ago

And, as I said earlier, the NDC run (h_ndc_bIT) of scenario_config_NGFS_v3.csv works (using commit 49ad60690f9f8a30e20c81525c0808aa4e957a13, last merge is #1130). So I don't think that the problem is easy to find :(

Just to add to the evidence here, I had SSP2EU-NDC failing too, last merge #1128 . Tried with several GDXs and same problem, irredeemable INFES in MEA from the first iteration. But using Oliver's NGFS run as a starting point actually worked:

/p/projects/piam/abrahao/scratch/lowen_calibrate/output/SSP2EU-NDC_2022-12-22_18.48.07

However, starting from the same point but with a different calibration (some weird project stuff) made the first iteration OK but immediately reverted to the same problem on iteration 2. Maybe there's something making CONOPT get trapped? Are there any CONOPT parameters we can tweak to make it widen the search or something? Still investigating what those INFES actually are.

/p/projects/piam/abrahao/scratch/lowen_calibrate/output/SSP2EU_lowEn-NDC_2022-12-22_18.47.11

orichters commented 1 year ago

Whatever happened in PR

1135
1134
1133

that all seem completely unrelated, the most recent AMT NDC run was successful.

Renato-Rodrigues commented 1 year ago

I debug this a bit using last week's run (SSP2EU-AMT-NDC_2022-12-17_05.27.10).

This is what I could find:

Issue:

Equation q32_flexAdj is using the marginal prices (pm_SEprice for seel) from the previous iteration on its calculations. (1) If pm_SEprice is negative for a given region - this could happen for example in initial or terminal years due to quantity bounds -, in the next iteration: (2) vm_flexAdj will act in the opposite direction as it should (as far as I understood), (3) v21_taxrevFlex will blow up in value due to this, (4) vm_taxrev will blow up in value as consequence, (5) the budget equation will have a very hard time to compensate this tax unbalance, (6) the model will show infeasibilities as consequence.

Temporary work around:

I disabled the flex tax (cm_flex_tax = 0) and restarted the infeasible run: /p/projects/remind/users/renatoro/Debug_trunk/2022_12_27/output/SSP2EU-AMT-NDC_2022-12-17_05.27.10_rerun_noFlexTax The new run converged without issues. (the same run with cm_flex_tax = 1 for reference: /p/projects/remind/users/renatoro/Debug_trunk/2022_12_27/output/SSP2EU-AMT-NDC_2022-12-17_05.27.10_rerun)

Details:

First of all, I am also not familiar with the flex tax formulation, so maybe @robertpietzcker @fschreyer should check if I am not saying something completely off here. I cannot say also if this is the only problem without further tests and runs.

Anyway, the use of marginals in a model equation can potentially explain why the model is unstable, sometimes working and sometimes not. Marginals are solver dependable, and small changes in the initial point or in other variables can avoid corner solutions or over-bounded situations that could cause negative marginals. Even memory garbage and numeric approximations can influence on that. I would try to avoid as much as it is possible to use directly marginals from previous iterations on model equations, unless you control directly to avoid these edge cases.

A real solution for this would be to reformulate the FlexTax code to not rely in marginals. If it is really necessary to use marginal values from previous iterations, I would rely instead on pm_FEPrice_by_SE_Sector to provide the prices, as this should be more stable than pm_SEprice values, and either force a minimum price value to avoid the negative price problem or force vm_flexAdj to be a positive variable if this is really the intention behind the formulation (I am not 100% sure about that as I do not control the flex tax code).

Renato-Rodrigues commented 1 year ago

on a related point, @robertpietzcker @fschreyer is it really necessary to have the flex tax dynamics entirely endogenous to the model? From the equations I could look at, the vm_flexAdj dynamics is very non-linear. As it is using the tax formulation framework, maybe this could be potentially defined in between iterations reducing the complexity, and potentially the solution time, of the model.

orichters commented 1 year ago

Thanks, @Renato-Rodrigues, I'm very grateful for you digging into this problem.

robertpietzcker commented 1 year ago

Hey all, thanks for raising these points!

I myself wasn't involved in the flextax stuff, but as Felix is on parental leave and I did most of the other power sector integration stuff, I guess it falls in my responsibility :-)

From a first thinking-through of Felix equations, I was a bit surprised about their complexity, but then I realized that as long as cm_FlexTaxFeedback is off (which it luckily is in default setting),

vm_capFac gets fixed again in IntC/bounds.gms, so that
q32_flexPriceShareMin collapses so that
v32_flexPriceShareMin is fixed and not a variable anymore, but rather has the value of ~0.5. This means
v32_flexPriceShare is simply 1- (0.5 times the total VRE share) - so 1 at low VRE shares, 0.5 at high VRE shares
q32_flexPriceBalance is turned off.

so what remains is

vm_flexAdj is equal to (1-v32_flexPriceShare ) times the electricity price from last iteration, so 0 at low VRE shares, and 0.5 at high VRE shares; and
v21_taxrevFlex equal to -vm_flexAdj times vm_demSE minus the value from the last iteration.

So principally there should not be any fundamental problem with the electricity price becoming negative - this should only make vm_flexAdj move from positive to negative, thus instead of giving an incentive to use the technology, it should give a disincentive to the technology - not really problematic. (Not perfect for convergence, likely, but a negative electricity price will anyway create some weird incentives).

Still, it would likely be better to remove any effect at prices below 0. I guess one could simply change q32_flexAdj to use =g= instead of =e= and make v32_flexAdj be positive,

but maybe that would lead to some unnecessary freedom for the model.

So maybe a cleaner way would be to create a parameter pm_SEPrice_noNegatives that contains pm_SEPrice but with all values <0 set to 0, and use this in the equation.

What do you think, @Renato-Rodrigues?

oh, and if anyone of you has a run where it seems that q32_flexAdj really is the culprit for creating an infeasible solution, please send it over so I can have a look. The one that Renato checked (that was infeasible with flexTax on and feasible when it was turned off) also became feasible with flextax on once it was run in debug mode (as REMIND does so often...) /p/projects/remind/users/renatoro/Debug_trunk/2022_12_27/output/SSA_SSP2EU-AMT-NDC_2022-12-27_16.36.56

robertpietzcker commented 1 year ago

From the equations I could look at, the vm_flexAdj dynamics is very non-linear. As it is using the tax formulation framework, maybe this could be potentially defined in between iterations reducing the complexity, and potentially the solution time, of the model.

I agree, Renato. At least for the parts that collapse when cm_FlexTaxFeedback is off (concretely q32_flexPriceShareMin) it might make sense to not only let them collapse in runtime, but simply turn them off and instead fix the variable v32_flexPriceShareMin in bounds.gms or so when cm_FlexTaxFeedback is off.

Renato-Rodrigues commented 1 year ago

I second for limiting pm_SEPrice to only non-negative values in the equation as the simpler work around for the issue. My suggestion at the IEA-Update channel was in this line, but instead of creating an additional parameter, I suggested to add a dollar condition to the right hand side of the equation so it would be zero when prices are negative. Both should theoretically work in the same way.

The one that Renato checked (that was infeasible with flexTax on and feasible when it was turned off) also became feasible with flextax on once it was run in debug mode (as REMIND does so often...)

Did you tried to set a normal scenario run and save the iteration gdxs instead? If you don't set it to run in debug mode, you minimize the disruption to the way the solver searches for a solution. The infeasible iteration gdxs should contain all the info necessary to debug the run.

not really problematic. (Not perfect for convergence, likely, but a negative electricity price will anyway create some weird incentives)

I only disagree with this part of your analysis. If you have any minimal bound applied to the affected capacities, no matter how small it is, the marginal equation values ("fake negative prices") could be high enough to cause an infeasibility in the budget equation balance due to resulting extreme cost levels applied to the technology.

The problem is that the solver marginals can be huge due to computational solver artifacts instead of economic reasons. This would happen specially in situations that the model is overbounded for example. If you have an overbounded situation in the model, the solver will set an extreme value for the marginal of the affected equation as it would do do anything possible to get ride of any quantity from the affected equation. This cause crazy marginal values that have no economical reason to be, besides a computational artifact caused by the solver way of searching for the problem solution. These marginals should never be considered to have economic foundation or as a reasonable model outcome to be used in later iterations. The size of these marginals is quite volatile also. Even memory garbage in floating point assignment to store variables could cause near-zero approximations that would cause the solver to overbound or cull equations in some cases, and do nothing in other cases. This is a know issue that affects most of integer and non-linear computational solvers. It is also one of the main reasons why using .up and .lo bounds are recommend whenever possible instead of defining literal equations, as variable bounds are considered strictly (with no tolerance), meanwhile equation defined bounds consider tolerances and can be culled from the solver due to numerical approximations.

In summary, you cannot say a priori if extreme marginal values provided by the solver are economic based or a computationally assigned solver value without analyzing if their ranges make sense in the first place. Specially if they happen close to initial and terminal years in the model, as they tend to be more overbounded. Any automated algorithm that does not take that into consideration is subject to unstable and unreliable results. As the proposed solution here disconsider entirely the negative marginal values, I would say that most probably we could get away with using the marginal values directly in this equation. Nevertheless, I would be extra careful on using solver marginals directly in the model equations in any automated way if you don't give enough attention to the issues I explained above.

remindmodel / remind

SSP2EU-NDC runs crash #1115

1000 was reverted: https://github.com/remindmodel/remind/pull/1094

1135

1134

1133