ocaml-flambda / flambda-backend

The Flambda backend project for OCaml
93 stars 67 forks source link

Remove unnecessary regions for %int_as_pointer #2677

Open mshinwell opened 4 weeks ago

mshinwell commented 4 weeks ago

This PR aims to correct bad code generation when %int_as_pointer is used at local mode, in a context where the corresponding region is not required for any actual stack allocation. For example in the following (which can be compiled with the boot compiler using -nostdlib -nopervasives):

[@@@ocaml.flambda_o3]

external int_as_pointer : int -> local_ int = "%int_as_pointer"

let iop t = local_ (int_as_pointer t) [@inline]

let u t (local_ f) = f (local_ (iop t)) [@nontail] [@inline]

external unsafe_set: local_ 'a array -> int -> 'a -> unit = "%array_unsafe_set"
external magic : (_[@local_opt]) -> (_[@local_opt]) = "%obj_magic"

let set t i v =
  u t (fun[@inline] t -> unsafe_set ((magic t) : int array) i v)

let normal_set (t : int array) i v =
  unsafe_set t i v

we would like it to be the case that the code for set is the same as for normal_set except for one subtraction (corresponding to %int_as_pointer). This is the case after this PR. At the moment it is not the case, with the generated code loading and storing the local stack pointer.

The %int_as_pointer primitive should be (and is) treated in the main part of flambda2 (everything before To_cmm) a bit like a zero-sized stack allocation. The aim is to ensure that the primitive is not pushed past the end-region, just like any other local allocation. (At present I think this is really enforced by the fact that End_region is a hard barrier in To_cmm, but with a more advanced effect analysis in that pass in the future, the region variable named by the Int_as_pointer primitive will ensure this.)

Not pushing %int_of_pointer past an end-region is actually a proxy for the real property we are trying to ensure, which is that the GC can never see the result of this primitive (as it is untagged and could cause misbehaviour, especially with a no-naked-pointers compiler or runtime5).

The idea of this patch, then, is to delete "unused" regions during To_cmm. The definition of "unused" here is "not needed to do any stack allocation", so in particular, a region only held onto via Int_as_pointer is to be deleted. Now it seems like there might a risk of code motion transformations (now or in the future) in the backend needing the region in order to ensure soundness. Indeed, there is arguably a pre-existing problem here, because Int_as_pointer is expanded into Cmm code that does not name any region variable during Cmm_helpers. This patch aims to adopt a straightforward approach which I hope is going to work when tested on a large scale: make that Cmm code have return machtype Addr. This in theory would allow the backend to move the primitive past an End_region, although I don't think there is currently any code in the compiler which would do this, but it does ensure (at compile time) the above property relating to the GC. This is the critical part.

The patch itself is fairly easy. There are probably three PRs here to be split out:

  1. Make Cmm_helpers.int_as_pointer return something of machtype Addr.
  2. Make To_cmm traverse the flambda2 terms in dominator order, so something straightforward can be done in point 3 below. I think this is probably a good change anyway. Some more thinking needs doing about the handling of loops (especially in conjunction with point 3. below), I just haven't had time yet.
  3. Track in To_cmm_result which regions have been seen to be used, and don't emit Cmm code for End_region primitives if the corresponding region is unused. It's a bit unfortunate that this seems to involve passing the Simple.t for the argument to the unary primitive simplification function, but this does seem to be much better than trying to do this based on the Cmm variables (as Ident.t). I tried that latter way and it got complicated very quickly.

I think the %int_of_pointer test at local mode in the testsuite may not really be testing the right thing, but I will discuss with @riaqn .

lpw25 commented 4 weeks ago

I'm not sure it causes any issues currently, but I want to clarify what the region is for:

Not pushing %int_of_pointer past an end-region is actually a proxy for the real property we are trying to ensure, which is that the GC can never see the result of this primitive

The aim is to ensure that the GC can never see the result of this primitive after the end of the region. It is safe for the GC to see the result before then, and indeed I would expect it to do so in a lot of cases where we pass the value into some other function that takes a local argument. It is also safe for the GC not to look at the result at all -- so having it be Addr should be safe -- but it is only unsafe for it to look after the region.

Note that this means you can get code like:

type t = { res : int }

let id x = x
[@@inline never]

let foo x =
  let y = int_as_pointer x in
  let z = id y in
  z.res

let bar arg =
  let a = malloc arg in
  let res = foo a in
  free a;
  res

where it is important that the GC does not look at z after the call to free. This probably means that the Addr is not sufficient to prevent problematic optimizations (which I also think don't currently exist). Perhaps something that instead marks the Begin_region and End_region constructs as "ghost" would be safer?

mshinwell commented 4 weeks ago

Ah yes, indeed these cases with free won't be caught by the Addr thing (not that this invalidates the remainder of this patch I don't think).

I have wondered about keeping the regions right until the end of Cfg, at which point they could be deleted. However this requires changing the Mach/Cfg operations to hold onto the region registers somehow (and also to add an int_as_pointer Cmm primitive etc). It seems like that could maybe affect code generation, though I would have to discuss with Greta and Xavier and think about this further.

lthls commented 4 weeks ago

@chambart suggested to use other variables with kind Region (I think it's also what Leo suggested above). Every function would have two region parameters: one for stack allocations, and one for scoping only. When translating from lambda, we would insert Begin_region and End_region twice for each lambda region, with a flag on one of them marking it as virtual. Allocations would point to the current non-virtual region, Int_as_pointer to the current virtual region, function applications to both. We would then be able to track both regions independently, and the virtual region instructions would be compiled away somewhere in the backend.

mshinwell commented 4 weeks ago

See #2682