ocaml-flambda / flambda-backend

The Flambda backend project for OCaml
93 stars 67 forks source link

Add repro case for flambda2 boxing a float when there are multiple branches that need it, despite explicit calls to box and unbox #2703

Open nmatschke opened 3 weeks ago

nmatschke commented 3 weeks ago

It's common for code to pull an element out of a float array and pass it to some function. It's typically desirable that flambda not box these floats until it needs to (for example, to format an exception).

However, if there are multiple branches that need a boxed float, flambda will box the float passed to the function, and the only known way to prevent this is to do some arithmetic identity operation e.g. add -0.

Surprisingly, obj_dup on its own is insufficient to prevent the undesirable boxing. Even more surprisingly, if the function explicitly calls unbox_float on its argument, and then uses separate box_float calls when printing the exception, this is still insufficient.

mshinwell commented 2 weeks ago

@Gbury could you please look at this? We at first thought this might be the usual lack of something like PDCE, but Leo thought there might be a bug.

lthls commented 2 weeks ago

This is regular CSE, i.e. not inserting a boxed allocation when one is already there. It has the unfortunate side effect of keeping the original allocation alive on all paths, but that's expected without PDCE. The upstream compiler would probably have removed the allocation, but we don't use the same algorithm at all for unboxing so I don't consider it a bug if we're not strictly better.

lthls commented 2 weeks ago

I suspect that classic mode might actually be able to remove the allocation if old_f and new_f are annotated with [@inline].

nmatschke commented 2 weeks ago

That makes sense. One thing I find particularly sad about this though is that we can write an entire function in terms of unboxed floats and still end up with an allocation. I updated the code to make this more obvious - semantically, the only thing I've asked the compiler to do is to pull a float out of an array and immediately unbox it, but somehow the function to which I pass an unboxed float prevents that allocation from being eliminated.

Also, this does actually insert movq and addsd instructions for no good reason.

One thing that would help is if obj_dup on its own (in the error branches) were sufficient to avoid the boxing, since again semantically that's just requesting a new allocation...

Gbury commented 2 weeks ago

Just as @lthls said, this is an unfortunate side-effect of CSE, and it seems not clear that there is a way to know when not to do CSE to avoid cases such as this.

I think the answer is a form PDCE, together with indications that some branches are cold (or a heuristic that detects some exception raising as cold branches) so that we can try and sink down allocations into cold branches (potentially duplicating these allocations) if the hot/non-cold branches do not need the allocation.

nmatschke commented 2 weeks ago

Okay, seems like this isn't worth trying to fix right now, thanks for taking a look. Is it worth merging this test with a comment about the current state for posterity?

chambart commented 2 weeks ago

One probably simple suggestion might be to have the invalid_arg branch marked as unlikely and cut all potentially problematic CSE equations on that branch

mshinwell commented 13 hours ago

@nmatschke If you add a reason = "..."; skip; to the test, we should be able to merge this.