ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
559 stars 72 forks source link

Eio.Resource and Type-Safety #774

Open mbarbin opened 3 weeks ago

mbarbin commented 3 weeks ago

Hello Eio team,

I've been re-using the pattern from Eio.Resource in the provider project. Recently, I've spent some time trying to make the use of Obj.magic more robust in provider. You can see some of the discussions and efforts here.

During this process, I've encountered cases of invalid uses that lead to unfortunate results. I'm sharing one such case that is also applicable to Resource:

type ('a, 'impl, 'tag) Eio.Resource.pi += Trait : (unit, 'a, [ `A ]) Eio.Resource.pi

let a = (Trait : (unit, string, [ `A ]) Eio.Resource.pi)
let h = Eio.Resource.handler [ H (a, "hello") ]
let b = (Trait : (unit, int, [ `A ]) Eio.Resource.pi)

let%expect_test "crash" =
  let (i : int) = Eio.Resource.get h b in
  print_s [%sexp { is_int = (Obj.is_int (Obj.repr i) : bool) }];
  [%expect {| ((is_int false)) |}];
  ()
;;

I'm not sure if it is possible to address this issue. I'd be interested in discussions with the Eio team on this.

Thank you for your time and for Eio!

Arogbonlo commented 3 weeks ago

Hey @mbarbin

I think to address the type-safety issues with Eio.Resource and Obj.magic here, start by minimizing or eliminating Obj.magic whenever possible. This function bypasses OCaml’s strict type system, which can lead to unexpected runtime errors. Also, redefine the types explicitly to ensure they match the expected type constraints.

patricoferris commented 3 weeks ago

Thanks @Arogbonlo for trying to help here, I think the use of the object module in this part of Eio is v. different to other places and necessary. The implementation could perhaps use a few comments because reading the code is pretty tricky (the API docs are good though).

mbarbin commented 3 weeks ago

Thanks @Arogbonlo that is sound advice. In this instance I've struggled in my attempts to remove the conceptually equivalent use of Obj.magic in my project, without success.

@patricoferris I also don't think it's possible to remove it without (perhaps invasive) changes to the exposed API. It would be sad if too many ergonomic aspects are lost in the process.

It's probably low priority given that cases like the one I shared only arise with invalid usage of the library. At any rate, I'll let you know if I find anything interesting. Thank you for looking into the issue.

talex5 commented 3 weeks ago

Thanks @mbarbin, that's a problem, indeed!

Logically, a handler is a function like this:

  type ('t, 'tags) handler = { fn : 'impl. ('t, 'impl, 'tags) pi -> 'impl }

Then the magic shouldn't be needed. The array was intended to make it more efficient and ergonomic, but looks like it needs some revising!

Note that OCaml's object types solve this problem easily, so if you can use them without annoying people then that would be a better option.

(to keep the variance annotations you might need to do some magic on tags, but that should be safe since it's only a phantom parameter)

mbarbin commented 3 weeks ago

I can offer to have a look if you'd like since I'm already working on similar ideas for provider. No worries if you have other plans though, whatever makes most sense to you and the project. Thanks @talex5!

talex5 commented 3 weeks ago

Please go ahead!

mbarbin commented 3 weeks ago

I was inspired by that famous TV show and decided to phone a friend on that one. This is joint work with @v-gb.

I propose to use this PR[^1] as a preview. Please let me know what you think. I'd be happy to create a PR for Eio.Resource based on a similar approach. For Eio, it would seem we'd only need the equivalent of Provider.Trait.Create. We looked for ways to prevent segfaults by design, with or without Obj.magic in the code. This version was the only one we found that doesn't need the magic and keeps the ergonomics in check.

[^1]: We encountered a scary dragon along the way, but luckily were able to return safe and sound. The underlying issue has been fixed by the compiler team. It is also not in the critical path for the changes.

mbarbin commented 2 weeks ago

I started a draft following a similar approach.

I counted 22 Resource.pi extended constructors in the project.

Details below:

Simple form

The 17/22 are of the simple form:

K : ('t, (module S with type t = 't), [> ty]) Resource.pi

They are compatible with the approach, and can be rewritten as:

module K : sig
  val pi : ('t, (module S with type t = 't), [> ty]) Resource.pi
end = Resource.Pi.Create (struct type 't iface = (module S with type t = 't) end)

Parametrized tags

The 5 remaining ones use another pattern. Consider lib_eio/net.ml/Network as an example:

Network : ('t, (module NETWORK with type t = 't and type tag = 'tag), [> 'tag ty]) Resource.pi

In this pattern, the signature has 2 types. Its second type is bound as a parameter of the last argument of pi (tag ty here).

I hadn't noticed this pattern before. I find it more tricky because the tags part of pi become more than a simple phantom type : if you do not tag your handler correctly, you are essentially writing unsafe code:

module type Trait = sig
  type t
  type a

  val get_a : t -> a
end

type ('a, 'impl, 'tag) Eio.Resource.pi +=
  | Trait :
      ( 't
      , (module Trait with type t = 't and type a = 'a)
      , [ `A of 'a ] )
        Eio.Resource.pi

let get_a (type tag) (Eio.Resource.T (t, ops) : [> `A of tag ] Eio.Resource.t) =
  let (module T) = Eio.Resource.get ops Trait in
  T.get_a t
;;

module Int_trait = struct
  type t = int
  type a = string

  let get_a = string_of_int
end

let h : (int, [> `A of int ]) Eio.Resource.handler =
  Eio.Resource.handler [ H (Trait, (module Int_trait)) ]
;;

let%expect_test "crash" =
  let (a : int) = get_a (T (42, h)) in
  print_s [%sexp { is_int = (Obj.is_int (Obj.repr a) : bool) }];
  [%expect {| ((is_int false)) |}]
;;

It looks like a potentially more concerning problem than the previous counter-example. All the more reasons for revising?

lib_eio/time.ml

  type (_, _, _) Resource.pi +=
    | Clock : ('t, (module CLOCK with type t = 't and type time = 'time), [> 'time clock_ty]) Resource.pi

The type time ends up with 2 disjoint instances throughout the project, float and Mtime.t, so it is possible to functorize over it. I experimented with this idea here.

net & process

They are parametrized by [> Generic ] (which sometimes has [ Unix ]), so the parameter of the tag itself uses subtyping.

I don't know yet how to handle that part, and currently left it for future work. I am hoping this will not undermine the whole approach, but I am not confident.

talex5 commented 2 weeks ago

Hmm, interesting.

I'm wondering about another possibility: what if we require users to provide a function (as above, but returning an option), and convert it to an array in Resource.handler by calling it once for each pi value.

I haven't tried it, but I think that would ensure that the values are sufficiently polymorphic.

It would mean that when defining a new API you'd have to list all the pi values, but would have the advantage of actually testing that they're all provided at handler definition time.

mbarbin commented 1 week ago

I'm wondering about another possibility: what if we require users to provide a function (as above, but returning an option), and convert it to an array in Resource.handler by calling it once for each pi value.

You may be ahead of me on this. I'm under the impression that the user provided pattern matching would only get you as far as the Create functor.

I am curious if you envision that this would make the compiler infer the type of the tags during handler creation. It's hard to know without trying, but my impression is that as long as the function to build a handler does not constrain the tags on its own, the API will still be subject to the sort of the counter-example number 2 (constructing a segfault by improperly casting the tags).

Such as in the current api:

val handler : 't binding list -> ('t, _) handler

I wonder if there is a construct in the language that could allow a unification of the tags while the bindings are joined (such as in a list, or pattern matching). Would using a object type instead of polymorphic variant in the tag phantom parameter give more power/flexibility?

The 2 remaining cases are essentially the moral equivalent of a parametrized OCaml class.

If the constrain for the parameter cannot be guaranteed via the tags, maybe a possible direction is to add more parameters to what's currently exposed:

val get : ('t, 'param, 'tags) handler -> ('t, 'param, 'impl, 'tags) pi -> 'impl

By the way, could you point me to an example in Eio for net and process that builds a handler that is not 'Unix. I couldn't find one (is it for compiling to javascript?). For example, lib_eio_posix.Process.Process_impl.tag = [ 'Generic | 'Unix ]. I'd like to better understand how the code makes use of this subtype for this tag parameter, for added context.