usask-arg / sasktran2

The next generation SASKTRAN radiative transfer model
MIT License
1 stars 6 forks source link

Refactor to pre-calculate phase functions #99

Open dannyzed opened 6 months ago

github-actions[bot] commented 6 months ago

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1 means a speed up and greater than 1 means a slowdown. Green lines beginning with + are slowdowns (the PR is slower then master or master is slower than the previous release). Red lines beginning with - are speedups.

Significantly changed benchmark results (PR vs main)


All benchmarks:

| Change   | Before [88c477e7]    | After [5d7738c4]    | Ratio   | Benchmark (Parameter)                                                    |
|----------|----------------------|---------------------|---------|--------------------------------------------------------------------------|
|          | 30.0±0.1ms           | 29.9±0.09ms         | 1.00    | do_large.DOLarge.time_do_large(100, False, 16, 1)                        |
|          | 241±2ms              | 243±3ms             | 1.01    | do_large.DOLarge.time_do_large(100, False, 16, 3)                        |
|          | 1.25±0.01ms          | 1.27±0.02ms         | 1.01    | do_large.DOLarge.time_do_large(100, False, 2, 1)                         |
|          | 1.87±0.03ms          | 1.85±0.01ms         | 0.99    | do_large.DOLarge.time_do_large(100, False, 2, 3)                         |
|          | 212±0.6ms            | 213±0.7ms           | 1.00    | do_large.DOLarge.time_do_large(100, False, 32, 1)                        |
|          | 2.17±0.01s           | 2.19±0s             | 1.01    | do_large.DOLarge.time_do_large(100, False, 32, 3)                        |
|          | 1.97±0.05ms          | 1.98±0.03ms         | 1.00    | do_large.DOLarge.time_do_large(100, False, 4, 1)                         |
|          | 5.58±0.02ms          | 5.58±0.02ms         | 1.00    | do_large.DOLarge.time_do_large(100, False, 4, 3)                         |
|          | 6.01±0.06ms          | 6.08±0.05ms         | 1.01    | do_large.DOLarge.time_do_large(100, False, 8, 1)                         |
|          | 34.1±0.4ms           | 33.7±0.2ms          | 0.99    | do_large.DOLarge.time_do_large(100, False, 8, 3)                         |
|          | 1.59±0.01ms          | 1.60±0ms            | 1.00    | do_large.DOLarge.time_do_large(2, False, 16, 1)                          |
|          | 5.15±0.06ms          | 5.06±0.01ms         | 0.98    | do_large.DOLarge.time_do_large(2, False, 16, 3)                          |
|          | 994±10μs             | 979±3μs             | 0.99    | do_large.DOLarge.time_do_large(2, False, 2, 1)                           |
|          | 996±3μs              | 1.00±0.01ms         | 1.00    | do_large.DOLarge.time_do_large(2, False, 2, 3)                           |
|          | 4.95±0.05ms          | 4.92±0.01ms         | 0.99    | do_large.DOLarge.time_do_large(2, False, 32, 1)                          |
|          | 33.6±0.06ms          | 33.7±0.1ms          | 1.00    | do_large.DOLarge.time_do_large(2, False, 32, 3)                          |
|          | 988±6μs              | 1.00±0.01ms         | 1.02    | do_large.DOLarge.time_do_large(2, False, 4, 1)                           |
|          | 1.11±0.01ms          | 1.11±0ms            | 0.99    | do_large.DOLarge.time_do_large(2, False, 4, 3)                           |
|          | 1.09±0ms             | 1.10±0ms            | 1.01    | do_large.DOLarge.time_do_large(2, False, 8, 1)                           |
|          | 1.63±0.01ms          | 1.63±0ms            | 1.00    | do_large.DOLarge.time_do_large(2, False, 8, 3)                           |
|          | 6.67±0.01ms          | 6.73±0.06ms         | 1.01    | do_large.DOLarge.time_do_large(20, False, 16, 1)                         |
|          | 49.3±0.4ms           | 48.8±0.4ms          | 0.99    | do_large.DOLarge.time_do_large(20, False, 16, 3)                         |
|          | 1.03±0.01ms          | 1.05±0.02ms         | 1.02    | do_large.DOLarge.time_do_large(20, False, 2, 1)                          |
|          | 1.13±0.01ms          | 1.14±0.01ms         | 1.01    | do_large.DOLarge.time_do_large(20, False, 2, 3)                          |
|          | 42.9±0.2ms           | 43.3±0.2ms          | 1.01    | do_large.DOLarge.time_do_large(20, False, 32, 1)                         |
|          | 414±4ms              | 407±3ms             | 0.98    | do_large.DOLarge.time_do_large(20, False, 32, 3)                         |
|          | 1.16±0.01ms          | 1.18±0ms            | 1.01    | do_large.DOLarge.time_do_large(20, False, 4, 1)                          |
|          | 1.97±0.01ms          | 1.97±0.01ms         | 1.00    | do_large.DOLarge.time_do_large(20, False, 4, 3)                          |
|          | 1.95±0.01ms          | 1.97±0ms            | 1.01    | do_large.DOLarge.time_do_large(20, False, 8, 1)                          |
|          | 7.35±0.02ms          | 7.36±0.02ms         | 1.00    | do_large.DOLarge.time_do_large(20, False, 8, 3)                          |
| -        | 892±8ms              | 640±30ms            | 0.72    | limb_singlescatter.LimbSingleScatter.time_limb_single_scatter(16, False) |
|          | n/a                  | n/a                 | n/a     | limb_singlescatter.LimbSingleScatter.time_limb_single_scatter(16, True)  |
|          | 654±7ms              | 641±7ms             | 0.98    | limb_singlescatter.LimbSingleScatter.time_limb_single_scatter(4, False)  |
|          | 4.26±0.1s            | 4.58±0.02s          | ~1.07   | limb_singlescatter.LimbSingleScatter.time_limb_single_scatter(4, True)   |
| +        | 237±1ms              | 250±1ms             | 1.06    | twostream.TwoStreamNadirPlaneParallel.time_two_stream_nadir(100, False)  |
|          | 5.48±0.01s           | 5.61±0.01s          | 1.02    | twostream.TwoStreamNadirPlaneParallel.time_two_stream_nadir(100, True)   |
|          | 9.07±0.02ms          | 9.07±0.04ms         | 1.00    | twostream.TwoStreamNadirPlaneParallel.time_two_stream_nadir(2, False)    |
|          | 28.2±0.1ms           | 28.5±0.07ms         | 1.01    | twostream.TwoStreamNadirPlaneParallel.time_two_stream_nadir(2, True)     |
|          | 44.6±0.1ms           | 45.7±0.2ms          | 1.02    | twostream.TwoStreamNadirPlaneParallel.time_two_stream_nadir(20, False)   |
|          | 343±1ms              | 348±2ms             | 1.02    | twostream.TwoStreamNadirPlaneParallel.time_two_stream_nadir(20, True)    |

Full benchmark results can be found as artifacts in GitHub Actions (click on checks at the top of the PR).