Open Mark-Simulacrum opened 1 week ago
@bors try @rust-timer queue
Awaiting bors try build completion.
@rustbot label: +S-waiting-on-perf
:hourglass: Trying commit 206251b04f5b0f8623209af6caab2a515c71eb9f with merge 15ee67784efd5d46670c56f52fd618a4a00771e5...
The job mingw-check-tidy
failed! Check out the build log: (web) (plain)
:sunny: Try build successful - checks-actions
Build commit: 15ee67784efd5d46670c56f52fd618a4a00771e5 (15ee67784efd5d46670c56f52fd618a4a00771e5
)
Queued 15ee67784efd5d46670c56f52fd618a4a00771e5 with parent 9c9b568792ef20d8459c745345dd3e79b7c7fa8c, future comparison URL. There are currently 0 preceding artifacts in the queue. It will probably take at least ~1.2 hours until the benchmark run finishes.
Finished benchmarking commit (15ee67784efd5d46670c56f52fd618a4a00771e5): comparison URL.
Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.
Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged
along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.
@bors rollup=never @rustbot label: -S-waiting-on-perf +perf-regression
This is a highly reliable metric that was used to determine the overall result at the top of this comment.
mean | range | count | |
---|---|---|---|
Regressions ❌ (primary) |
0.8% | [0.2%, 2.9%] | 169 |
Regressions ❌ (secondary) |
0.9% | [0.2%, 2.3%] | 70 |
Improvements ✅ (primary) |
- | - | 0 |
Improvements ✅ (secondary) |
-1.8% | [-2.3%, -0.4%] | 5 |
All ❌✅ (primary) | 0.8% | [0.2%, 2.9%] | 169 |
This benchmark run did not return any relevant results for this metric.
Bootstrap: 676.55s -> 678.643s (0.31%) Artifact size: 315.72 MiB -> 316.01 MiB (0.09%)
This replaces the single Vec allocation with a series of progressively larger buckets. With the cfg for parallel enabled but with -Zthreads=1, this looks like a slight regression in i-count and cycle counts (<0.1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5% wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based approach, likely due to reducing the bouncing of the cache line holding the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319 seconds).
FIXME: Extract the internals to rustc_data_structures, safety comments, etc.
r? @Mark-Simulacrum -- opening for perf first.