matt-kempster / m2c

A MIPS and PowerPC decompiler.
GNU General Public License v3.0
386 stars 46 forks source link

Refactor phi handling #234

Closed simonlindholm closed 2 years ago

simonlindholm commented 2 years ago

The current system for phis has long been a thorn in my side. It results in suboptimal output in several ways:

This PR replaces it by a system where phis use variables set by EvalOnceExpr temps, building on #233 which makes sure each value source is an EOE. Well, replaces is a strong word, and "combines" may be closer to the truth. The original vision was to do ahead-of-time analysis on the use-def graph to determine which instructions should serve as phi sources -- this ahead-of-time analysis is needed in order to get prevent_later_uses to deal properly with temps being able to share vars. However, this idea runs into difficulty on two fronts:

Working around the issue for now, this PR takes the approach of keeping the previous phi system around, but marking temps as phi sources for the next translation pass during assign_phis. The old phis are renamed "naive phis" in the code, and marked phi sources are "planned vars". --reg-vars is changed to use the same infrastructure. While hacky, the result is from what I can see just as good as with the proper approach.

Possible point of contention: this renames variables from phi_x to var_x. I think it's an improvement, phi_x was always kind of weird naming.

Project diffs: https://gist.github.com/simonlindholm/3cf6a93365b4b9b3207c23ee63c2ea89 (mm), https://gist.github.com/simonlindholm/376b089c1fe3fab430b679f9dc3bdb49 (papermario). I've looked through these a fair bit and nothing stands out to me as worse than before.

This gives a reduction in output size of 3% on mm, 4% on oot, and 5.5% on papermario.

Closes #82.

simonlindholm commented 2 years ago

This should be ready to go now! I updated the PR description with an overview.

zbanks commented 2 years ago

Two stray thoughts about this PR:

Possible point of contention: this renames variables from phi_x to var_x. I think it's an improvement, phi_x was always kind of weird naming.

If we wanted to keep the phi_ naming prefix, planned phis could be named phi_ (instead of var_) and naive phis could be named nphi_ or rawphi_ (instead of phi_)?

I don't feel super strongly either way. I do like that "phi" is a distinctive name, even if it's a bit weird (contrasted with "var" which just sounds like a synonym to "temp"). On the other hand, now that the phi system is pretty robust, it's more OK to hide it from the end user?


In EvalOnceExpr, could the 3 booleans emit_exactly_once, trivial, and transparent be turned into a single 4-value enum? This would describe the "strength" of the EOE: exactly_once, normal, transparent, trivial.

Although there isn't much explicit coupling between emit_exactly_once & trivial/transparent, I don't think it makes sense to have an EOE that is both emit_exactly_once and transparent? And as far as I can tell, there's no existing expression that would set both.

simonlindholm commented 2 years ago

If we wanted to keep the phi naming prefix, planned phis could be named phi (instead of var) and naive phis could be named nphi or rawphi (instead of phi)?

Yeah, this is an option. I forgot to note this in the PR description but naive phis only show up in the output of 2 function across the whole of mm so I'm not too worried about how we name them -- the only time they are going to realistically show up is if someone passes --passes=1.

In EvalOnceExpr, could the 3 booleans emit_exactly_once, trivial, and transparent be turned into a single 4-value enum? This would describe the "strength" of the EOE: exactly_once, normal, transparent, trivial.

Interesting idea. You're right that transparent and emit_exactly_once are mutually exclusive. I've considered making trivial EOE's into a separate expression kind but thought it wouldn't be worthwhile; this sounds like a light-weight version of that. emit_exactly_once, transparent and trivial could all become boolean member functions.