ocaml-multicore / eio

Effects-based direct-style IO for multicore OCaml
Other
540 stars 66 forks source link

api questions and some migration feedback #642

Closed mbarbin closed 9 months ago

mbarbin commented 9 months ago

Hi!

I've recently gone through the exercise of migrating some code that used Printf & Stdio.{In,Out}_channel & Fpath to Eio, and I had a mix of questions, feedback, and data point of things that gave me pause. This is all very minor, things worked great, thank you for Eio!

Reading with Buf_read

This is just pointing out what I found confusing, which I believe is just some minor naming inconsistency: sometimes the reader checks that the next thing is as given, and consumes it (such as Buf_read.string), sometimes it is actually only reading but not checking Buf_read.line. And sometime a take_ prefix is added Buf_read.take_all but sometimes not Buf_read.line.

I wish there was a common prefix for things that check things (e.g. check ?) and some common naming for things that take things (perhaps take , or no prefix if the naming of things that check do get their own prefix (e.g. Buf_read.string would be free if the current one becomes Buf_read.check_string).

Writing to a sink

I felt like missing a function to refactor code that would make use of format string, such as Printf.printf or Format.printf equivalent. So far this looks like Eio.Flow.copy_string (Printf.sprintf "..." ...) sink. It would be great to have a more direct alternate, that probably would take the sink as first argument and format second.

Filename part of Eio.Path

I've transitioned some code that made use of Fpath, and in particular sections of code that would need to do some matching on the complete filename, using Fpath.segments. In the new Eio code, I've been using (path |> snd |> String.split ~on:'/'), which mostly do what I need. But I felt it's brittle, since it is relying on information I have looked in the implementation (that the second value of the tuple is manipulated with the Filename module). I don't mean to use native paths here, because I would like to be able to match on the path with an interface equivalent to Fpath that would be common regardless of the os in which this will be executed. I don't have a concrete proposal here. If you are interested perhaps I could give this more thoughts. What about for example:

val Eio.Path.filename : _ Eio.Path.t -> string
val Eio.Path.segments : _ Eio.Path.t -> string list

Type that encapsulate some capabilities during create

I've a type that offers functionality that interacts with the path and may spawn processes from time to time. I've taken so far the approach to require in create the part of the env that will be used by the other function of the interface, rather than adding each capability at the granularity of each function that operates on t that might need it.

- type t
+ type 'a t
val create
  : ...
- -> root_path:Fpath.t
- -> t
+  -> process_mgr:_ Eio.Process.mgr
+  -> output:_ Eio.Flow.sink
+  -> root_path:'a Eio.Path.t
+  -> 'a t

In the implementation I so far used GADTs to store the sink and progess_mgr to deal with their existential type parameter.

type process_mgr = Process_mgr : 'a Eio.Process.mgr -> process_mgr
type flow_sink = Flow_sink : 'a Eio.Flow.sink -> flow_sink

type 'a t =
  { root_path : 'a Eio.Path.t
  ; process_mgr : process_mgr
  ; output : flow_sink
  }

let create ~process_mgr ~output ~root_path =
  { root_path
  ; process_mgr = Process_mgr process_mgr
  ; output = Flow_sink output
  }
;;

I kept the path parameter since it worked well at call site for example:

val touch : 'a t -> path: 'a Eio.Path.t -> unit

But I didn't find the need to expose all of the other capabilities parameters in the api. And this seemed to be a handful to expose them all as type parameters (also, this would probably be harder to maintain in case some new ones are added later).

I wonder what you'd think about it. Is there some alternate designs that would work? Is there some super type for these capabilities that would have no parameter that I could use instead? Thanks!

talex5 commented 9 months ago

For Buf_read, the names were chosen to be compatible with Angstrom's names; e.g. https://github.com/inhabitedtype/angstrom/blob/5536d1da71469b49f37076beb5f75d34f448da5e/lib/angstrom.mli#L89.

I think Buf_write.printf would be a great addition (PRs welcome).

The use of Filename.concat in path.ml is a bug - it will use the wrong separator on Windows. There is a Path.split that might be useful for you. Calling it in a loop will get all the components. That might be a useful convenience function to add indeed (along with basename and dirname).

As an alternative to using a GADT, you can also cast to a concrete type: see https://github.com/ocaml-multicore/eio#casting

mbarbin commented 9 months ago

Hi @talex5.

I did some initial exploration based on your answer here. If you think some bits are going the right direction (printf, components, basename and dirname) I can try and create some PRs out of it.

Re: casting.

As an alternative to using a GADT, you can also cast to a concrete type: see https://github.com/ocaml-multicore/eio#casting

Sorry I missed that section of the doc. This is spot on and I ended up preferring this style. Thanks! While writing the cast I wondered whether it would make sense to offer some canonical type alias in eio to simplify usage. I explored it a bit here, please let me know your thoughts.

Re: api compatibility. Thanks for pointing out the Angstrom constraint. It occurred to me that with what's already exposed by Buf_read and Buf_write, many things can be done in the user land depending on people's need, so I'm happy to do some exploration as third party libraries for now (example here). Thank you!

talex5 commented 9 months ago

Using primes for the generic types looks interesting. I didn't add them originally as there are so many types already.

I think basename etc should just be e.g. snd (split t). Why is extra logic needed?

mbarbin commented 9 months ago

Re: basename

I expect the domain for basename and dirname to be any valid Eio.Path.t', thus my inclination was for their types to be:

  val basename : _ Eio.Path.t -> string
  val dirname : _ Eio.Path.t -> string

If you limit their implementation to the domain where split is defined, you'd end up with a string option in the returned type (thus the extra logic in the None case).

For the behavior, I based it on what I witness was the behavior of unix utils basename and dirname when typed in a terminal. For example:

$ dirname ''
.

You can have a look at the test cases I looked at here

I plan on creating separate PRs for the different things we talked about here, that way we'll be able to further discuss options there. Thank you!

mbarbin commented 9 months ago

Re: basename in PR #648

mbarbin commented 9 months ago

Re: type-aliases: I couldn't think of a technic that offers the same potential benefits without adding complexity and more types. I'll continue experimenting with it.

I believe my feedback is entirely covered at this point (thank you @SGrondin for #652 !), I'm happy to close this issue.

Thank you.

talex5 commented 7 months ago

(Buf_write.printf was added in #655)