Open nmatschke opened 3 weeks ago
@Gbury could you please look at this? We at first thought this might be the usual lack of something like PDCE, but Leo thought there might be a bug.
This is regular CSE, i.e. not inserting a boxed allocation when one is already there. It has the unfortunate side effect of keeping the original allocation alive on all paths, but that's expected without PDCE. The upstream compiler would probably have removed the allocation, but we don't use the same algorithm at all for unboxing so I don't consider it a bug if we're not strictly better.
I suspect that classic mode might actually be able to remove the allocation if old_f
and new_f
are annotated with [@inline]
.
That makes sense. One thing I find particularly sad about this though is that we can write an entire function in terms of unboxed floats and still end up with an allocation. I updated the code to make this more obvious - semantically, the only thing I've asked the compiler to do is to pull a float out of an array and immediately unbox it, but somehow the function to which I pass an unboxed float prevents that allocation from being eliminated.
Also, this does actually insert movq
and addsd
instructions for no good reason.
One thing that would help is if obj_dup
on its own (in the error branches) were sufficient to avoid the boxing, since again semantically that's just requesting a new allocation...
Just as @lthls said, this is an unfortunate side-effect of CSE, and it seems not clear that there is a way to know when not to do CSE to avoid cases such as this.
I think the answer is a form PDCE, together with indications that some branches are cold (or a heuristic that detects some exception raising as cold branches) so that we can try and sink down allocations into cold branches (potentially duplicating these allocations) if the hot/non-cold branches do not need the allocation.
Okay, seems like this isn't worth trying to fix right now, thanks for taking a look. Is it worth merging this test with a comment about the current state for posterity?
One probably simple suggestion might be to have the invalid_arg
branch marked as unlikely and cut all potentially problematic CSE equations on that branch
@nmatschke If you add a reason = "..."; skip;
to the test, we should be able to merge this.
It's common for code to pull an element out of a
float array
and pass it to some function. It's typically desirable that flambda not box these floats until it needs to (for example, to format an exception).However, if there are multiple branches that need a boxed float, flambda will box the float passed to the function, and the only known way to prevent this is to do some arithmetic identity operation e.g. add -0.
Surprisingly,
obj_dup
on its own is insufficient to prevent the undesirable boxing. Even more surprisingly, if the function explicitly callsunbox_float
on its argument, and then uses separatebox_float
calls when printing the exception, this is still insufficient.