iiasa / ixmp

The ix modeling platform for integrated and cross-cutting scenario analysis
https://docs.messageix.org/ixmp
Apache License 2.0
37 stars 110 forks source link

Improve performance in add_par() #441

Closed khaeru closed 2 years ago

khaeru commented 2 years ago

This applies some simple optimizations in Scenario.add_par() and underlying backend code by using pandas' (fast) internals rather than (slower) utilities/code in ixmp.utils.

Anecdotally (using pytest-profiling) there is about a 50% speed-up in message_data test cases where the runtime is dominated by add_par().

The branch also adds .tests.test_perf.test_add_par() using pytest-benchmark; these are invoked with pytest -k perf --benchmark-only. While I am not experienced with performance profiling, these results seem to show consistently better performance.

## Code on `main`

-------------------------------------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------------------------------------
Name (time in ms)                   Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_add_par[1000.0]            73.9765 (1.0)          73.9765 (1.0)          73.9765 (1.0)      0.0000 (1.0)          73.9765 (1.0)      0.0000 (1.0)           0;0  13.5178 (1.0)           1           1
test_add_par[100.0]            150.6392 (2.04)        150.6392 (2.04)        150.6392 (2.04)     0.0000 (1.0)         150.6392 (2.04)     0.0000 (1.0)           0;0   6.6384 (0.49)          1           1
test_add_par[10000.0]          432.8315 (5.85)        432.8315 (5.85)        432.8315 (5.85)     0.0000 (1.0)         432.8315 (5.85)     0.0000 (1.0)           0;0   2.3104 (0.17)          1           1
test_add_par[1000000.0]     39,330.8795 (531.67)   39,330.8795 (531.67)   39,330.8795 (531.67)   0.0000 (1.0)      39,330.8795 (531.67)   0.0000 (1.0)           0;0   0.0254 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## This branch

-------------------------------------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------------------------------------
Name (time in ms)                   Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_add_par[1000.0]            66.0219 (1.0)          66.0219 (1.0)          66.0219 (1.0)      0.0000 (1.0)          66.0219 (1.0)      0.0000 (1.0)           0;0  15.1465 (1.0)           1           1
test_add_par[100.0]             67.3930 (1.02)         67.3930 (1.02)         67.3930 (1.02)     0.0000 (1.0)          67.3930 (1.02)     0.0000 (1.0)           0;0  14.8383 (0.98)          1           1
test_add_par[10000.0]          326.6322 (4.95)        326.6322 (4.95)        326.6322 (4.95)     0.0000 (1.0)         326.6322 (4.95)     0.0000 (1.0)           0;0   3.0615 (0.20)          1           1
test_add_par[1000000.0]     32,624.5220 (494.15)   32,624.5220 (494.15)   32,624.5220 (494.15)   0.0000 (1.0)      32,624.5220 (494.15)   0.0000 (1.0)           0;0   0.0307 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

How to review

PR checklist

codecov[bot] commented 2 years ago

Codecov Report

Merging #441 (cdd8aa9) into main (0c1a143) will increase coverage by 0.0%. The diff coverage is 100.0%.

@@          Coverage Diff          @@
##            main    #441   +/-   ##
=====================================
  Coverage   98.5%   98.5%           
=====================================
  Files         41      42    +1     
  Lines       4380    4382    +2     
=====================================
+ Hits        4318    4320    +2     
  Misses        62      62           
Impacted Files Coverage Δ
ixmp/backend/io.py 98.5% <ø> (-0.1%) :arrow_down:
ixmp/backend/jdbc.py 95.1% <100.0%> (-0.1%) :arrow_down:
ixmp/core/scenario.py 97.9% <100.0%> (-0.1%) :arrow_down:
ixmp/tests/test_perf.py 100.0% <100.0%> (ø)
OFR-IIASA commented 2 years ago

I have been able to test the PR, but not via the above suggested command pytest -k "transport and build_bare". Instead I ran the GLOBIOM emulator. On main running the emulator script for the global model, of which most time is taken up by populating the table land_output, takes about 7minutes. On this branch, the process time is almost identical. The main change to still be made is to adjust the logging level - currently this seems to be set to "DEBUG".

khaeru commented 2 years ago

Instead I ran the GLOBIOM emulator. On main running the emulator script for the global model, of which most time is taken up by populating the table land_output, takes about 7minutes. On this branch, the process time is almost identical.

Thanks for checking.

Some more information on investigating performance:

Above I mentioned using pytest-profiling. Specifically, I run 1 or a few tests with pytest -k "this or that" --profile-svg and then open the file prof/combined.svg that's created. An example is attached below. Here, along the central (green) chain is a box scenario:429:add_par. The info says it is run 28 times ("28×") which together take up 48.32% of the total time of this pytest run.

Using this, my process is roughly:

  1. Run slow code via its tests.
  2. Look at the profile visualization.
  3. Identify slow steps.
  4. Improve performance of those steps.
  5. Repeat from (1).

If (4) is successful, the % of total time for the slow code will decrease, and the % for other (next-slowest, unaltered) codes will correspondingly increase.

The example reflects a particular MESSAGEix-Transport test in which add_par() was the slowest piece of code. But we have a lot of code in message_data written using loops, etc.; for these, it may not be add_par() which is the slowest step, and thus improving performance of add_par() will not have a large or even noticeable impact on total runtime.

It is necessary to do these kinds of investigations (again, I am not an expert) to identify whether ixmp or downstream code is responsible for poor performance. Also, when ixmp is responsible, sometimes this is because of the Java code in JDBCBackend or JPype used to interact with it, and so there we cannot make any improvements either. In short, the scope for these optimizations is quite narrow.

combined

LauWien commented 2 years ago

I have been able to test the PR, but not via the above suggested command pytest -k "transport and build_bare". Instead I ran the GLOBIOM emulator. On main running the emulator script for the global model, of which most time is taken up by populating the table land_output, takes about 7minutes. On this branch, the process time is almost identical. The main change to still be made is to adjust the logging level - currently this seems to be set to "DEBUG".

Thanks for checking Oliver! I requested you as a reviewer and removed me.

OFR-IIASA commented 2 years ago

pytest -k "transport and build_bare" on main took 8.72s while on this branch it took 7.87s. The only change before merging is to set the logging level back, so that DEBUG messages are not shown.

khaeru commented 2 years ago

@OFR-IIASA with https://github.com/iiasa/ixmp/pull/441#discussion_r840491256 addressed this should be ready for ✅.