Open JoJoDeveloping opened 1 month ago
Hm, I'm wary of this heuristic because I tried something like it to solve the inverse problem and didn't get good results. Do you have any demonstrations of workloads where this implementation makes a huge difference?
My main immediate concern is that this can lead to significant slowdown if there are large trees that do not get compacted. So I feel like if we have such a heuristic, it should also take into account "did the big tree actually shrink on the last GC run" or something like that, and increase the GC period if the answer is "no".
What does this PR do on MIRIFLAGS=-Zmiri-tree-borrows ./miri bench
? Please show numbers from before and after this change.
(Unfortunately we don't have a way to automatically compare two bench reports and print "things got X% slower/faster"... see https://github.com/rust-lang/miri/issues/3999 )
My main immediate concern is that this can lead to significant slowdown if there are large trees that do not get compacted. So I feel like if we have such a heuristic, it should also take into account "did the big tree actually shrink on the last GC run" or something like that, and increase the GC period if the answer is "no".
The way this works is by initially having some exponential back off with how often the GC runs (whenever the size doubles since the last GC run). Eventually that goes to linear (whenever a tree grew by X nodes since the last GC run). The reason is that I recon that most programs do not have many different provenances around (by using a large array storing pointers only differing in their provenance, for example). For the tree to not get compacted by the GC requires that programs execute lots of retags and then keep all these pointers alive. On such workloads, the GC already does nothing. Even even such programs would have to create and store a new reference every 7 basic blocks (with the current settings) for it to be slowed down by this patch. But note that for programs with a lot of memory use, the GC already needs to be tuned down (probably) for things to work.
In the end it is all trade-offs. This one seems to be worth it. I am not sure the current performance testsuite consists of programs where this shows (one way or another) but I can add some where it does.
After some musing with @saethlin, we figured the better answer than having a GC might be to use (some form of) ref-counting, at least in theory. Unfortunately that would require a large refactoring (and past attempts of doing it failed? ), and so for now we seem to be stuck with the GC, for better or worse.
The way this works is by initially having some exponential back off with how often the GC runs (whenever the size doubles since the last GC run).
Right, that makes sense, it just wasn't clear from the PR description. :)
Eventually that goes to linear (whenever a tree grew by X nodes since the last GC run).
That's a bit more surprising. What is the rationale for that? Did you compare the PR with and without this part?
Many programs not having a ton of provenances would already be captured by the first part, no? (Also I am not sure that's true, each live reference has its own provenance and there can easily be a ton of those if someone has a large array of things that involve references.)
I am not sure the current performance testsuite consists of programs where this shows (one way or another) but I can add some where it does.
Yes that would be good -- preferably one benchmark demonstrating a clear benefit. (We don't want to grow our number of benchmarks too much, either...)
We don't want to grow our number of benchmarks too much, either...
Do we not? Changes like the one in this PR would be much easier to find/develop/test if the suite of benchmarks was more diverse and larger in general. It its current form, it is useful only to ensure there are no really major performance regressions. But that's a separate issue.
More benchmarks means they take forever to run, and at least until we figure out https://github.com/rust-lang/miri/issues/3999 also forever to analyze.
:umbrella: The latest upstream changes (presumably #4009) made this pull request unmergeable. Please resolve the merge conflicts.
Miri's GC runs every 10000 basic block, by default. It turns out that sometimes, it is better if the GC runs more often. This is especially true for some Tree Borrows programs, since in Tree Borrow, memory accesses take time linear in the size of a certain tree, and this tree can grow quite large, but is usually compacted significantly at every GC run. For some programs, it can be advantageous to run the GC every 200 blocks, or even more often.
Unfortunately, such a GC frequency would slow down other programs. To achieve a balance, this PR proposes a "heuristics" that allows running the GC more often based on how large the Tree Borrows tree grows. It is quite likely that other parts of Miri want to plug into this mechanisms as well, eventually. However, such concrete other cases are further work, as long as the system is designed to be general enough. This PR only contributes integration with Tree Borrows. It is also possible that the actual heuristics need more tweaks, but for now, the performance improvements look promising.
It is further likely that the overall design needs some change, since it currently relies on
Cell
s. Also, I was not too sure where to put the newenum
s. So this PR is WIP for now, although it is functionally complete.