khaeru commented 2 years ago

This applies some simple optimizations in Scenario.add_par() and underlying backend code by using pandas' (fast) internals rather than (slower) utilities/code in ixmp.utils.

Anecdotally (using pytest-profiling) there is about a 50% speed-up in message_data test cases where the runtime is dominated by add_par().

The branch also adds .tests.test_perf.test_add_par() using pytest-benchmark; these are invoked with pytest -k perf --benchmark-only. While I am not experienced with performance profiling, these results seem to show consistently better performance.

I am not sure if the first two cases are erroneously displayed out of order.
For adding 10⁶ data points, the improvement is about 20%.

## Code on `main`

-------------------------------------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------------------------------------
Name (time in ms)                   Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_add_par[1000.0]            73.9765 (1.0)          73.9765 (1.0)          73.9765 (1.0)      0.0000 (1.0)          73.9765 (1.0)      0.0000 (1.0)           0;0  13.5178 (1.0)           1           1
test_add_par[100.0]            150.6392 (2.04)        150.6392 (2.04)        150.6392 (2.04)     0.0000 (1.0)         150.6392 (2.04)     0.0000 (1.0)           0;0   6.6384 (0.49)          1           1
test_add_par[10000.0]          432.8315 (5.85)        432.8315 (5.85)        432.8315 (5.85)     0.0000 (1.0)         432.8315 (5.85)     0.0000 (1.0)           0;0   2.3104 (0.17)          1           1
test_add_par[1000000.0]     39,330.8795 (531.67)   39,330.8795 (531.67)   39,330.8795 (531.67)   0.0000 (1.0)      39,330.8795 (531.67)   0.0000 (1.0)           0;0   0.0254 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## This branch

-------------------------------------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------------------------------------
Name (time in ms)                   Min                    Max                   Mean            StdDev                 Median               IQR            Outliers      OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_add_par[1000.0]            66.0219 (1.0)          66.0219 (1.0)          66.0219 (1.0)      0.0000 (1.0)          66.0219 (1.0)      0.0000 (1.0)           0;0  15.1465 (1.0)           1           1
test_add_par[100.0]             67.3930 (1.02)         67.3930 (1.02)         67.3930 (1.02)     0.0000 (1.0)          67.3930 (1.02)     0.0000 (1.0)           0;0  14.8383 (0.98)          1           1
test_add_par[10000.0]          326.6322 (4.95)        326.6322 (4.95)        326.6322 (4.95)     0.0000 (1.0)         326.6322 (4.95)     0.0000 (1.0)           0;0   3.0615 (0.20)          1           1
test_add_par[1000000.0]     32,624.5220 (494.15)   32,624.5220 (494.15)   32,624.5220 (494.15)   0.0000 (1.0)      32,624.5220 (494.15)   0.0000 (1.0)           0;0   0.0307 (0.00)          1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean

How to review

Read the diff of the following files:
- ixmp/core/scenario.py
- ixmp/backend/jdbc.py
Optional:
- Check out/install the branch.
- Run code that is heavy on adding data to scenarios; e.g. in message_data pytest -k "transport and build_bare".
- Note the runtime/performance versus main or the last release.

PR checklist

[x] Continuous integration checks all ✅
[x] Add or expand tests; coverage checks both ✅
[x] Add, expand, or update documentation.
[x] Update release notes.

codecov[bot] commented 2 years ago

Codecov Report

Merging #441 (cdd8aa9) into main (0c1a143) will increase coverage by 0.0%. The diff coverage is 100.0%.

@@          Coverage Diff          @@
##            main    #441   +/-   ##
=====================================
  Coverage   98.5%   98.5%           
=====================================
  Files         41      42    +1     
  Lines       4380    4382    +2     
=====================================
+ Hits        4318    4320    +2     
  Misses        62      62

Impacted Files	Coverage Δ
ixmp/backend/io.py	`98.5% <ø> (-0.1%)`	:arrow_down:
ixmp/backend/jdbc.py	`95.1% <100.0%> (-0.1%)`	:arrow_down:
ixmp/core/scenario.py	`97.9% <100.0%> (-0.1%)`	:arrow_down:
ixmp/tests/test_perf.py	`100.0% <100.0%> (ø)`

OFR-IIASA commented 2 years ago

I have been able to test the PR, but not via the above suggested command pytest -k "transport and build_bare". Instead I ran the GLOBIOM emulator. On main running the emulator script for the global model, of which most time is taken up by populating the table land_output, takes about 7minutes. On this branch, the process time is almost identical. The main change to still be made is to adjust the logging level - currently this seems to be set to "DEBUG".

khaeru commented 2 years ago

Instead I ran the GLOBIOM emulator. On main running the emulator script for the global model, of which most time is taken up by populating the table land_output, takes about 7minutes. On this branch, the process time is almost identical.

Thanks for checking.

Some more information on investigating performance:

Above I mentioned using pytest-profiling. Specifically, I run 1 or a few tests with pytest -k "this or that" --profile-svg and then open the file prof/combined.svg that's created. An example is attached below. Here, along the central (green) chain is a box scenario:429:add_par. The info says it is run 28 times ("28×") which together take up 48.32% of the total time of this pytest run.

Using this, my process is roughly:

Run slow code via its tests.
Look at the profile visualization.
Identify slow steps.
Improve performance of those steps.
Repeat from (1).

If (4) is successful, the % of total time for the slow code will decrease, and the % for other (next-slowest, unaltered) codes will correspondingly increase.

The example reflects a particular MESSAGEix-Transport test in which add_par() was the slowest piece of code. But we have a lot of code in message_data written using loops, etc.; for these, it may not be add_par() which is the slowest step, and thus improving performance of add_par() will not have a large or even noticeable impact on total runtime.

It is necessary to do these kinds of investigations (again, I am not an expert) to identify whether ixmp or downstream code is responsible for poor performance. Also, when ixmp is responsible, sometimes this is because of the Java code in JDBCBackend or JPype used to interact with it, and so there we cannot make any improvements either. In short, the scope for these optimizations is quite narrow.

combined

LauWien commented 2 years ago

I have been able to test the PR, but not via the above suggested command pytest -k "transport and build_bare". Instead I ran the GLOBIOM emulator. On main running the emulator script for the global model, of which most time is taken up by populating the table land_output, takes about 7minutes. On this branch, the process time is almost identical. The main change to still be made is to adjust the logging level - currently this seems to be set to "DEBUG".

Thanks for checking Oliver! I requested you as a reviewer and removed me.

OFR-IIASA commented 2 years ago

pytest -k "transport and build_bare" on main took 8.72s while on this branch it took 7.87s. The only change before merging is to set the logging level back, so that DEBUG messages are not shown.

khaeru commented 2 years ago

@OFR-IIASA with https://github.com/iiasa/ixmp/pull/441#discussion_r840491256 addressed this should be ready for ✅.

iiasa / ixmp

Improve performance in add_par() #441

How to review

PR checklist

Codecov Report