Closed AnneKitsune closed 3 years ago
The graphs are unreadable in GitHub dark mode - Could you add a background instead leaving it transparent?
I'm not totally sure if criterion has a config for this, but if I manage to find it I will!
I created an issue here: https://github.com/bheisler/criterion.rs/issues/469
I don't think that should stop this PR from being merged, as that can always be changed later.
I'm in favor of this change. The graphs help visualize the differences and automation is a huge plus. The biggest downside is the loss of "additional metadata" about "specialized" benchmarks. I think that sort of thing needs to be captured to make the comparisons meaningful and honest. Imo we should sort out a way to encode this information before removing the current system (which is already not good enough).
Ex:
Eh I should have cloned the repo and looked at all of the graphs first. The "specialized" tests are displayed in the graphs.
I guess thats good enough for now. Imo we should include more variants and documentation for those variants for each test, but thats not something to block this pr on :smile:
I also agree that some benchmarks are very misleading. Things like legion's batch entity creation or world.extend allows for massive performance gains compared to other ECS that can't or choose not to support batch methods.
This is a big enough change to the "result reporting" that we should get at least one more approval before merging. @TomGillen / @Ralith, can one or both of you take a look?
I'll try to have a look at this tomorrow; thanks for your work so far!
I also agree that some benchmarks are very misleading.
For the sake of discussion, I (from an obviously biased perspective) kinda disagree with this. The README gives a clear specification of the benchmarks, and it's one where batched operations are applicable. If an ECS chooses not to optimize for that case, that's fine, but it's still an important, realistic case, at least as far as anyone's taken it so far--nobody should be getting a free pass for sacrificing it. If unbatchable workloads are underrepresented, that's a different problem.
For the sake of discussion, I (from an obviously biased perspective) kinda disagree with this. The README gives a clear specification of the benchmarks, and it's one where batched operations are applicable. If an ECS chooses not to optimize for that case, that's fine, but it's still an important, realistic case, at least as far as anyone's taken it so far--nobody should be getting a free pass for sacrificing it. If unbatchable workloads are underrepresented, that's a different problem.
Imo the biggest problem is that I could easily implement and use "optimized" storages/algorithms for almost any benchmark, but the code could end up not looking like "idiomatic" bevy/legion/hecs/shipyard/etc code. Ex: the legion "simple insert" benchmark uses SoA insertion, but you won't find that in their readme/examples. SoA requires the user to "decouple" each entity's components across component-specific arrays, which I wouldn't recommend to anyone (and which I would prefer to not implement in Bevy at all). But now I feel "pressured" to implement it otherwise I can't "compete".
If Bevy starts using "sparse set" components in its add/remove benchmark (which isn't what I would recommend by default for Bevy users), then the only way legion can compete is to implement sparse set storage (which they might not want to do for design reasons).
I consider those two cases to be almost identical. The behavior they encourage for ECS authors and the picture the results paint for users feel "wrong" to me.
I don't think ecs_bench_suite
should encourage ECS authors to "play the benchmark game" with hyper-optimized per-benchmark results. I think it should encourage "real world" comparisons of ECS frameworks with "idiomatic" api usage. Anything outside of those bounds (or anything that might cause users to draw the wrong conclusions, such as comparing batched/unbatched benchmarks) should be clearly indicated.
Ex: I think we should have separate batched/unbatched insertion benchmarks. That clearly indicates which frameworks support batched insertion. Ideally we also have AoS and SoA labels. I also think Bevy shouldn't be allowed to silently use sparse set storage for add/removes. Instead we should provide both results (and probably indicate that Table storage is the default / recommended storage type).
Oh, I didn't realize legion was getting columnar data from the user in that case; that's more egregious than batching row-oriented operations, I agree. Hecs has such an API as well, but I don't consider it appropriate for that benchmark.
I absolutely agree that we should focus on benchmarking realistic patterns rather than gamesmanship, and it looks like we've already slipped into that more than I realized.
@Ralith the two missing plots are added
Oh, it doesn't generate violin graphs since only legion implemented those benchmarks. Woops! I guess it doesn't mean much to have benchmarks only for one ECS. What should we do?
I went ahead and merged the hecs support for that benchmark; merge master and it should be good to go.
Completed :+1:
Great, thanks! This is a big improvement.
This pull request adds violin images to the readme, so that manually editing the table doesn't need to be done manually anymore. Futher, it adds planck_ecs to the benchmarked ecs.