Locally built code runs faster than deployed one

zrho commented 3 years ago

For some reason the locally built version consistently runs about twice as fast as the deployed one on beta.homotopy.io. I have checked that it is indeed running the same version.

zrho commented 3 years ago

I locally run a nightly build of the rust compiler while the CI uses the stable toolchain, so that might explain the difference.

jamievicary commented 3 years ago

Interesting there is so much difference! Nathan maybe you can tell us about that?

On Tue, 27 Apr 2021, 17:48 Lukas Heidemann, @.***> wrote:

I locally run a nightly build of the rust compiler while the CI uses the stable toolchain, so that might explain the difference.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/homotopy-io/homotopy-rs/issues/60#issuecomment-827755495, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACQ4OHTDADEVLCQXCLY5FBTTK3TGJANCNFSM43VLITBQ .

zrho commented 3 years ago

@NickHu Could you change the CI to the nightly build? It seems to be stable enough (and if it is we can still change back) and it would be interesting to see if this indeed translates to the kind of speedup I am seeing locally.

jamievicary commented 3 years ago

@doctorn would be great to hear what you think about this.

doctorn commented 3 years ago

Running stable/nightly doesn't seem to make any real performance difference on my machine from what I can tell, but I don't know if I'm doing operations that are remotely complicated enough to trigger a dip. I don't know what's happening in the back end of the compiler at the moment but 2x speed up from nightly feels a little improbable.

doctorn commented 3 years ago

I keep noticing this in the build logs though:

WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).
This can impact web performance.
Assets:
  index.js (890 KiB)

Is this anything you guys have looked into? I don't know anything about browser performance but I wouldn't be surprised if there were some limits on remote resources that aren't placed on local ones.

doctorn commented 3 years ago

@zrho maybe try rustup override set stable in your build environment and see if you observe the same performance gap?

jamievicary commented 3 years ago

This workspace shows an expensive operation (go to the Project sidebar and click "import".)

To trigger the expensive operation, click the upper vertex and drag it down.

doctorn commented 3 years ago

Wait which workspace?

jamievicary commented 3 years ago

expensive_operation.zip Sorry here it is.

doctorn commented 3 years ago

Thanks!

jamievicary commented 3 years ago

On my machine this is 3169ms in Firefox, and 5421ms in Chrome. If you want an even slower operation I could make it for you, let me know.

jamievicary commented 3 years ago

It's an 8-dimensional topological contraction.

NickHu commented 3 years ago

I ran the benchmarks on the three rust channels: stable, beta, and nightly.

From stable → beta, I observed between a 3% to 7% performance increase across the board:

stable → beta benchmark comparison

contract scalar/left    time:   [7.8350 us 7.8421 us 7.8483 us]                                  
                        change: [-3.8405% -3.7385% -3.6495%] (p = 0.00 < 0.05)
                        Performance has improved.
contract scalar/right   time:   [7.6269 us 7.6801 us 7.7380 us]                                   
                        change: [-5.3581% -4.8766% -4.4768%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 30 outliers among 100 measurements (30.00%)
  19 (19.00%) low severe
  3 (3.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
contract beads/contract time:   [59.584 us 59.611 us 59.640 us]                                    
                        change: [-3.0039% -2.8751% -2.7497%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
contract beads/typecheck                                                                            
                        time:   [66.577 us 66.746 us 66.874 us]
                        change: [-6.2175% -5.7576% -5.3025%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  20 (20.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
contract stacks/contract                                                                            
                        time:   [61.454 us 61.486 us 61.527 us]
                        change: [-1.1610% -0.9192% -0.6157%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
contract stacks/typecheck                                                                            
                        time:   [72.679 us 72.709 us 72.740 us]
                        change: [-2.8353% -2.5470% -2.2046%] (p = 0.00 < 0.05)
                        Performance has improved.
contract high dimensions/2                                                                            
                        time:   [55.103 us 55.179 us 55.254 us]
                        change: [-0.3885% +0.1175% +0.6652%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
contract high dimensions/3                                                                            
                        time:   [356.83 us 357.01 us 357.22 us]
                        change: [-1.6945% -1.6278% -1.5616%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Benchmarking contract high dimensions/4: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.0s, enable flat sampling, or reduce sample count to 50.
contract high dimensions/4                                                                             
                        time:   [1.5262 ms 1.5391 ms 1.5512 ms]
                        change: [-5.2099% -4.5065% -3.7668%] (p = 0.00 < 0.05)
                        Performance has improved.
contract high dimensions/5                                                                            
                        time:   [6.2495 ms 6.2840 ms 6.3171 ms]
                        change: [-1.3141% -0.6778% -0.1200%] (p = 0.03 < 0.05)
                        Change within noise threshold.
contract high dimensions/6                                                                            
                        time:   [26.602 ms 26.614 ms 26.628 ms]
                        change: [-8.3995% -7.9735% -7.5144%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Benchmarking contract high dimensions/7: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 14.9s, or reduce sample count to 30.
contract high dimensions/7                                                                            
                        time:   [146.02 ms 146.77 ms 147.52 ms]
                        change: [-4.7908% -4.0469% -3.3177%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking contract high dimensions/8: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 93.4s, or reduce sample count to 10.
contract high dimensions/8                                                                            
                        time:   [956.48 ms 961.44 ms 966.43 ms]
                        change: [-7.0419% -6.3985% -5.7390%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking contract high dimensions/9: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 747.3s, or reduce sample count to 10.
contract high dimensions/9                                                                            
                        time:   [7.3343 s 7.4149 s 7.5198 s]
                        change: [-5.0406% -3.9315% -2.5071%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high severe

expand matchsticks/expand                                                                             
                        time:   [1.7139 us 1.7178 us 1.7228 us]
                        change: [-8.2036% -7.6896% -7.1342%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
expand matchsticks/typecheck                                                                             
                        time:   [16.231 us 16.363 us 16.489 us]
                        change: [-5.2159% -4.6276% -3.9924%] (p = 0.00 < 0.05)
                        Performance has improved.

but from beta → nightly I think the performance "increases" (and "decreases") are within the error margin. Nonetheless, it's not as drastic as Lukas is describing - possibly none of my benchmarks have captured the relevant phenomena, if they exist

doctorn commented 3 years ago

Trying Jamie's example, it took about 8000ms for both stable and nightly on my machine. Obviously that was just one iteration, but I think it points to the idea that something else is going on here too. The numbers that Nick is quoting are much closer to what I'd expect.

NickHu commented 3 years ago

I keep noticing this in the build logs though:
WARNING in asset size limit: The following asset(s) exceed the recommended size limit (244 KiB).
This can impact web performance.
Assets:
  index.js (890 KiB)
Is this anything you guys have looked into? I don't know anything about browser performance but I wouldn't be surprised if there were some limits on remote resources that aren't placed on local ones.

This warning comes from webpack, and it's basically complaining that the binary that gets spit out is a bit big, which could detriment page load times on slow internet speeds. This is to be expected, as webpack is typically expecting a bunch of (small) distinct javascript assets as opposed to one big wasm binary.

jamievicary commented 3 years ago

Hi Nick, thanks for this benchmarking! What are you using for the benchmarks?

NickHu commented 3 years ago

I used criterion.rs for benchmarking. The benchmarks are defined here. The only one which takes any significant time to run is the contract high dimensions benchmark, which does what we discussed the other day. I'll describe it again here for posterity:

Construct n beads by attachment in non-generic position; that is, the diagram has singular height n;
Take the identity on this diagram;
Contract from left to right until the diagram has singular height 1;
Project out one dimension: the projection has singular height n-1;
Goto step 2 and repeat until no more contractions are possible.

It takes about 7 seconds for this to finish on my machine when n = 9, but of course the benchmark which determines this collects 100 samples and so takes ~700 seconds overall.

If you want to run these benchmarks, you can run cargo bench from the repository checkout. Once the CI successfully completes a run, it should post a plot of benchmark against commit at https://beta.homotopy.io/dev/bench, and we can use this to track performance gains/regressions.

jamievicary commented 3 years ago

Yes that's exactly what we discussed the other day, it's great that you're implementing that, I just wanted to check. This is the same as my expensive_operation workspace above. I guess next to check would be this other allocator.

jamievicary commented 3 years ago

Visiting "https://beta.homotopy.io/dev/bench" gives a 404 error at the moment.

NickHu commented 3 years ago

No longer a 404, but currently there's only a single data point, so it's not able to draw any graph. A word of warning: the statistical methods employed by criterion aren't fully able to account for the CI environment which is inherently unstable; i.e., in the CI, the performance of the virtual machine will vary wildly depending on various external factors (how much CPU/memory pressure that node is experiencing, how many other jobs are running, etc). So it's best to take the numbers with a grain of salt, and benchmarks are expected to be more accurate when ran on a local machine (as in this case random external machine load is not expected).

That being said, even locally on my own machine, two distinct runs of the benchmarking suite on the same commit will vary by 5% in either direction for some of the benchmarks, and criterion will report this as a statistically significant performance increase/decrease (this is probably due to the tiny scale of some of the benchmarks, given that I did zero tuning for statistical parameters).

It would be nice if we could use something like iai, but it's not ready yet.

NickHu commented 3 years ago

I'm going to close this because I can't reproduce it, neither incidentally nor empirically.

jamievicary commented 3 years ago

It would be good to check with @zrho whether he still has the issue. If he does I think it should still be investigated.

zrho commented 3 years ago

I can't reproduce it anymore either.

homotopy-io / homotopy-rs

Locally built code runs faster than deployed one #60