Open xclerc opened 3 months ago
Technically, to_cmm
pushes the additions down rather than pushing the calls up, but that's a detail, ^^
That being said, from my point of view, the problem is how to decide when to push things down vs when not to, and from what I can tell right now, that answer seems highly dependant on the context: i.e. the number of live values, whether pushing float operations down actually makes some cmm-level optimizations possible, etc... All of these criterion are complex and not very local, and therefore, I don't think to_cmm
is the right place to decide what to do: we could envision a scheme where the decision is made by flambda2 and then carried out by to_cmm
(if it's easier to do there rather than in simplify).
(I am not positive this issue is not a duplicate of #1783)
The following code:
results in the following assembly:
It consumes many stack slots because the calls are pushed up, making the variables live from the calls to the end of the computation as shown by the CMM expression:
The pattern is similar to the one discussed in #1783, even though it does not end up with an allocation.
Callee-saved registers, or a tweaked allocation strategy for small leaf functions may help. Layout polymorphism could also help, by making
acc
a reference (it can be done "manually" by defining a record with a mutable field with the specific layout). As noted in #1783, it might be possible to tweakto_cmm
to avoid pushing the calls up.