Provide let operators in the standard library

lpw25 commented 5 years ago

This PR provides let operators (see #1947) for List, Option, Result and Seq. These operators are placed in a sub-module Syntax in each of these modules, which also includes a definition of return.

This enables code like:

let find_and_sum k1 k2 t =
  Option.Syntax.(
    let* x1 = Hashtbl.find_opt k1 t in
    let* x2 = Hashtbl.find_opt k2 t in
      return (x + y)
  )

I'm open to other suggestions for the name Syntax. I came up with: Computation, Comp, Expression, Expr and Syntax, from which I liked Syntax the best. I would say that the name should not be based around the word "let" because I suspect that in the future we will want to put other operators in there (e.g. match* from #1955).

Some of the required functions for these operators did not previously exist. None of the modules had their monoidal product and List was missing its bind. Of these I've exposed the product for List and Option, and the bind for List. I didn't expose the product for Result or Seq because I didn't feel people were likely to want to use these functions outside of maybe using them as and operators.

The documentation of the newly exposed functions is very minimal so improvements there are welcome.

Decision queued on:

nojb commented 5 years ago

Why not put them directly in the corresponding module, instead of inside a submodule?

As I see it, a submodule is useful when the module in question is typically opened to avoid bringing these operators into the global namespace, but my impression is that one won't typically open these modules.

lpw25 commented 5 years ago

As I see it, a submodule is useful when the module in question is typically opened to avoid bringing these operators into the global namespace, but my impression is that one won't typically open these modules.

Submodules are also useful for the reverse situation -- where the submodule is always opened but you don't want to bring in everything in the parent. That is the case here. I literally never want to open List but I need to use open to use these operators.

nojb commented 5 years ago

Good point, fair enough.

Drup commented 5 years ago

@lpw25 In that case, you could also include the Syntax module in the main one.

Module that contains operators are called Infix in batteries/containers, but it seems we're going to need a new convention...

dbuenzli commented 5 years ago

I'm open to other suggestions for the name Syntax.

I think it would be nice to have a name that can be used in general for bringing domain specific operators like specialized comparisons, arithmetic ops, monadic operators and other infix operators into the toplevel scope. I have the impression that you don't want to distinguish between these usages in practice (it would become a bit too bureaucratic) so having a single good name for all of these would be nice.

I'm not saying Syntax is necessarily a bad name but OTOH no new syntax is being defined. I would lean towards Ops which is short for the M.() notation and may prevent a few opens. Other than that with tongue in cheek I'd propose Open -- the module you are allowed to open.

lpw25 commented 5 years ago

I think it would be nice to have a name that can be used in general for bringing domain specific operators like specialized comparisons, arithmetic ops, monadic operators and other infix operators into the toplevel scope.

I'm not so sure about this. The let operators and any similar operators like match are of a very specific form. By opening one of these modules you are essentially choosing the kind of underlying computation for this piece of code. I think I prefer to have a name that means providing this specific kind of operation rather than a general provision of operators.

To put this another way, if someone wants to use some let operators for something other than monadic/applicative operations -- nothing immediately springs to mind but I'm sure there are plenty of things people might try -- then I would strongly prefer them not to put them in a Syntax module, since in their case let open Foo.Syntax in has a different underlying meaning.

I'm not sure its completely related, but I also like the parallel between this Bar.Syntax ( expr ) form and the computation expression form async { expr } used in F#. It feels like if you just put any old operators in Syntax then you are losing that parallel in some way.

All this is not to say that I'm against having a standardised name for modules containing all the operators associated with a type (e.g. Ops or Infix), I would just like to separately have a module containing only the "syntactic operators".

dbuenzli commented 5 years ago

To put this another way, if someone wants to use some let operators for something other than monadic/applicative operations -- nothing immediately springs to mind but I'm sure there are plenty of things people might try -- then I would strongly prefer them not to put them in a Syntax module, since in their case let open Foo.Syntax in has a different underlying meaning.

I can see your point. However with the current proposal you are attributing a (monadic) meaning to the name Syntax which is a bit odd.

lpw25 commented 5 years ago

However with the current proposal you are attributing a (monadic) meaning to the name Syntax which is a bit odd.

True. Maybe a name based on "computation" is better as that is the most commonly used word to cover both monadic and applicative.

dbuenzli commented 5 years ago

True. Maybe a name based on "computation" is better as that is the most commonly used word to cover both monadic and applicative.

In order to nurture a bit of confusion I propose either Seq or Haskell.

pqwy commented 5 years ago

I like Syntax as a convention. It's the module that rebinds the rebindable syntax, after all.

An alternative is to have sub-modules named after the meaning of the exported syntax, like List.Monad and Cmdliner.Applicative. But this strikes me as mouthful and unnecessary. Plus, we have an entire family of rebindable keywords now, and people will use them in various ways, so we end up with M1.Applicative.(let*) and M2.Monad.(and!).

To me, Syntax says "I'm changing the meaning of your syntax, go read the docs". And I like that.

dbuenzli commented 5 years ago

To me, Syntax says "I'm changing the meaning of your syntax, go read the docs". And I like that.

But then why not put other operators in such a module ? Something @lpw25 finds undesirable (not sure I'm totally convinced by his point though).

alainfrisch commented 5 years ago

Even if this adds some more names for existing functions, I think it would be good to expose return and bind (and perhaps product) in the parent modules. This is to support writing in explicitly monadic style, but without forcing to use the new let-syntax or at least the need to use open (e.g. let (let*) = List.bind in let* x = ... in ... .. List.return y).
Only exploring the design space: assuming we have directly List.bind and List.return, would it make sense to tell users to open Monad(List) now that this is possible? (Monad would just be a functor renaming some identifiers). Performance would not be great without a good inliner, though.
I don't feel fully comfortable with providing features that more or less force people to use open. The next PR that will add any identifier to one of those *.Syntax modules could easily break code because of shadowing. For instance, imagine this PR only proposed the let* form and not let+; people could start using let+ locally for their own purpose, including in contexts that also use the let* from List, and this would break if let+ were added later to List. This would suggest creating smaller sub-modules, which should really not be extended later.

trefis commented 5 years ago

For instance, imagine this PR only proposed the let form and not let+; people could start using let+ locally for their own purpose, including in contexts that also use the let from List, and this would break if let+ were added later to List.

Seems like these people would just be asking for trouble. Also, it would be very easy for them to fix their code in such a case.

hcarty commented 5 years ago

Even if this adds some more names for existing functions, I think it would be good to expose return and bind (and perhaps product) in the parent modules. This is to support writing in explicitly monadic style, but without forcing to use the new let-syntax or at least the need to use open (e.g. let (let) = List.bind in let x = ... in ... .. List.return y).

I strongly agree with this. Being able to quickly and explicitly bring one or two of these let/match/etc operators into scope would be very helpful for small scale readability. I've found that approach somewhat more readable for new and experienced OCamlers users with the existing monadic syntax ppxs.

alainfrisch commented 5 years ago

Seems like these people would just be asking for trouble.

I don't see why. Are you saying that the code will never mix different custom let binders? What about other identifiers? There is already return; could more identifiers be added later? If so, I suppose they will also have rather generic names, likely to clash with user identifiers.

Also, it would be very easy for them to fix their code in such a case.

I find this is a rather weak argument. Things are changing a bit, but we still consider maintaining backward compatibility of the core distribution an important feature, especially when breaking changes cannot be announced in advance with deprecation markers. Even if fixing the code base is easy enough, this requires intervention from the package author and from OPAM package maintainers (patching the code, creating a new release, making sure the previous one is marked as incompatible with the new OCaml, etc). If that libraries happened to be used by other OPAM packages, we can end up with breaking a good part of the eco-system for some time simply because we add more identifiers.

trefis commented 5 years ago

Are you saying that the code will never mix different custom let binders?

No, I'm just saying that I think people should explicitly bind their binders when they do that. But that's like, my opinion.

As for the "weak argument", it doesn't seem like much of an argument indeed. It just reflects the fact that I don't mind breaking code that shouldn't have been written in the first place.

But thinking more about it, I'm not sure that discussed style is as bad as I originally thought, and clearly: I have no idea what style people will adopt. So please, disregard my intervention in this discussion :)

lpw25 commented 5 years ago

Could I suggest not getting too bogged down in discussions on what we would do in the hypothetical situation that we wish to extend the signature of these modules. Yes we should probably be more careful in that situation than we would with other modules, but no its not a disaster if we absolutely had to add more operators since in practice it would probably break approximately no code and would give a nice warning about shadowing for cases where it did. The only cases where I could see us adding more values to these modules is for things like a future (match*) operator, in which case we obviously don't shadow anything in existing code.

Would people prefer it if I moved the return functions out of the Syntax modules? That way it would only be operators inside of them, and we would only be encouraging people to use open where it was strictly necessary.

I think it would be good to expose return and bind (and perhaps product) in the parent modules.

Do you mean e.g. having Seq.bind even though Seq.flat_map exists? I'm happy to do so, but I'd like there to be some consensus (or an executive decision) on how we want these things handled before I make the change.

Only exploring the design space: assuming we have directly List.bind and List.return, would it make sense to tell users to open Monad(List) now that this is possible?

I'm not completely against that approach, but there are some issues:

We would need to check that the resulting operators could still be inlined in non-flambda mode
You cannot do Monad.Syntax(List).( expr ) as that syntax is not currently supported (and may be ambiguous). You would have to use let open Monad.Syntax(List) in ... instead, which is not as nice.

It would also probably require having both of:

Monad.Syntax:
  functor (X : sig type 'a t val bind : ... val product : ... val map : .... end) ->
    sig val ( let* ) : ... val ( and* ) : ... val ( let+ ) : ... val ( and+ ) : ... end
Applicative.Syntax:
  functor (X : sig type 'a t val product : ... val map : .... end) ->
    sig val ( let+ ) : ... val ( and+ ) : ... end

dbuenzli commented 5 years ago

Would people prefer it if I moved the return functions out of the Syntax modules?

I think it wouldn't do any harm (it can be added later if it is perceived as missing). I think I would certainly write Ok v rather than return v if I'd use the Result monad. It also give a better local hint which let* are being used assuming someone opened the syntax at the toplevel.

gasche commented 5 years ago

The submodule discussion is very fun but I think there are other things to discuss as well, among which:

If a module has several distinct let-operator structures of interest¹, what's in the Syntax module? All of them? This has already been asked in https://github.com/ocaml/ocaml/pull/2170#issuecomment-442451338 and https://github.com/ocaml/ocaml/pull/2170#issuecomment-442453798.
Is there an implicit decision that * is for monadic operators, while + (or something else?) should be used for Applicative structure? Or is the the idea that + be used with map operators, so for the Functor structure? Is there consensus for that choice?

(To me having + for map and * for bind makes some sense, as bind is in some sense a "stronger" version of mapping with a value-dependent result structure/shape/effect.)

¹: I don't know offhand of modules with several distinct Monad structures of interest (I'm sure they exist, and probably examples can be built by lifting the case of numbers with several distinct Monoid structure), but all sequential collections (List, Seq, etc.) have a zipWith-style applicative structure that is distinct from their cartesian-product-esque monad.

My personal feeling is that we are moving a little too fast in solidifying conventions for rebindable-syntax lets. For extensions and attributes, @alainfrisch did a lot of work on case studies, presenting and discussing several use-cases for the feature, and evaluating each of the proposed syntaxes/designs on them. Where are those examples with the proposed operators here? In the #1947 discussion there were some experience reports from ppx_let, but the syntax used there is not exactly the same, and some of the questions need to be handled differently.

This is not directly related to #2170 itself, but personally I would like to see idiomatic real-world code that uses those new binding operators (we should also agree on a name for the feature, I would suggest "binding operator" rather than "let operator" given that and is also involved) with, let's say, list and option as a monad, and zipList and cmdLiner (or a parser, etc.) as applicatives.

In particular, I have the vague intuition that something is wrong with having (and* ) and (and+) defined to be the same thing in all cases. Shouldn't (and+) be more like List.combine than a cartesian product? I can't tell if this is a real issue or just sleep-deprivation speaking; if we had robust examples exercising these features, and not just abstract code with rewriting, we could easily tell for sure.

lpw25 commented 5 years ago

If a module has several distinct let-operator structures of interest¹, what's in the Syntax module? All of them?

I would expect individual Syntax modules for individual notions of computation. So for example, I would expect Seq.Syntax to be the normal monadic operations for Seq.t, with a separate Seq.Zip.Syntax for the zipping applicative form if people wanted to expose that.

Is there an implicit decision that * is for monadic operators, while + (or something else?) should be used for Applicative structure? Or is the the idea that + be used with map operators, so for the Functor structure? Is there consensus for that choice?

I've definitely been playing a bit fast and loose with the subtle difference between these two approaches -- refering to let* as monadic and let+ as applicative -- but really what I've implemented here is distinguishing between * for binding and + for mapping. So it would indeed be reasonable for a functor that was not an applicative to implement let+ but not and+.

This is also why I've put in both and+ and and* one is for mapping over a product and the other is for binding to a product. Of course, how you make a product doesn't depend on whether you are going to map over it or bind to it, so they are necessarily the same operation. I also think it looks slightly nicer to always use the same operator symbol for the let and the and.

If we instead said that * was for monadic and + was for applicative then I would not expect to have an and* operator. I would also not expect to see a functor implement let+ but not and+.

we should also agree on a name for the feature, I would suggest "binding operator" rather than "let operator" given that and is also involved

Sounds good.

In particular, I have the vague intuition that something is wrong with having (and* ) and (and+) defined to be the same thing in all cases. Shouldn't (and+) be more like List.combine than a cartesian product?

No. List.combine is the product for a ZipList style applicative. That applicative has no associated monad and also requires having a return that creates an infinite list instead of a singleton. We could expose it as List.Zip.Syntax, but it is definitely a different thing. (I also personally wouldn't expose it because I think the use of infinite lists in OCaml should not be encouraged -- Seq.t is a much more natural type to use for this kind of applicative).

My personal feeling is that we are moving a little too fast in solidifying conventions for rebindable-syntax lets. For extensions and attributes, @alainfrisch did a lot of work on case studies, presenting and discussing several use-cases for the feature, and evaluating each of the proposed syntaxes/designs on them. Where are those examples with the proposed operators here? In the #1947 discussion there were some experience reports from ppx_let, but the syntax used there is not exactly the same, and some of the questions need to be handled differently.

What exactly are you hoping to learn from these examples that you wouldn't learn by looking at existing code using ppx_let? Doing ad-hoc case studies isn't very helpful if you don't have some particular aim in mind or some hypothesis to test. I'm reluctant to spend my spare time manually converting other people's code to use this syntax without some clear objective in mind.

I'm unsure about moving more slowly on conventions. On the one hand, if we wait then perhaps some useful conventions will arise organically from the community and we can adopt them in the stdlib. On the other hand, we might wait and then have every different library using different conventions and find it is too late to try and get some uniformity via the stdlib.

Note that I already put some of the expected conventions into the manual, so if we do intend to wait and see then we might need to make some changes there as well.

gasche commented 5 years ago

I've definitely been playing a bit fast and loose with the subtle difference between these two approaches -- refering to let as monadic and let+ as applicative -- but really what I've implemented here is distinguishing between for binding and + for mapping. So it would indeed be reasonable for a functor that was not an applicative to implement let+ but not and+.

You are currently in the process of shaping the look and feel of monadic code (become more and more common) in OCaml for the years to come. I would very much like this look and feel to be explained precisely with appropriate examples and design guidelines. People new to monads should have some chance of following but, maybe even more importantly, people used to monadic or applicative code in their own extensions, or Haskell, or Coq/Agda/Idris, should be able to read whatever documentation and conventions we have and infer how to format their own knowledge/practices within those conventions.

In that respect, I think that being unclear about Functor vs. Applicative is not good, and that the various design choices you are pushing into the standard library should be clearly presented, discussed, and documented.

This is also why I've put in both and+ and and* one is for mapping over a product and the other is for binding to a product.

This discussion style is too abstract. Could you provide natural examples that look right with the current definition of and+? Show them with and* instead? Are there examples that would look nicer if and+ was the zipping one, instead of an alias for and*? (How do I express the zipping product if and+ is cartesian?)

(I also personally wouldn't expose it because I think the use of infinite lists in OCaml should not be encouraged -- Seq.t is a much more natural type to use for this kind of applicative).

Zipping is useful as a modular way to implement mapN operators, and as such it seems fairly common in lists. I've been working recently in parts of the codebase where List.map2 foo (List.combine bar baz) is a common sight. (We can build a 1-cyclic list, but I think we could also consider defining zipping so that zipping a one-element list with another length results in implicit extension rather than a failure.)

What exactly are you hoping to learn from these examples that you wouldn't learn by looking at existing code using ppx_let? Doing ad-hoc case studies isn't very helpful if you don't have some particular aim in mind or some hypothesis to test. I'm reluctant to spend my spare time manually converting other people's code to use this syntax without some clear objective in mind.

For one, I haven't seen any code using ppx_let. The only natural example I remember seeing is one of cmdliner usage with a custom let%map extension, provided by @diml. Providing URLs to idiomatic ppx_let would be a useful first step.

As a user reading documentation on this feature, here are some questions that I would like to ask:

What does this syntactic style look like for monads or idioms that I am already familiar with? The exception monad, the state monad, parser combinators, zipping as an applicative, a web-forms or command-line-parameters applicative library, a concurrency monad. Abstract examples against the Monad or Applicative or Functor interface would be nice, but concrete examples are also necessary for understanding.
When let* and let+ are both available, why should I use one instead of the other? Can I see a let*-using example that can be profitably rewritten with let+ or vice-versa?
When should I use and* (and/or and+) rather than just a sequence of lets? Is it always useful, or only for some monads? Are the two always equivalent?
What do the monad/applicative laws look like with binding operators?
As a library author, which of those binding operators should I expose? What is the design rationale behind this choice?

lpw25 commented 5 years ago

Are there examples that would look nicer if and+ was the zipping one, instead of an alias for and*? (How do I express the zipping product if and+ is cartesian?)

The zipping operations are not part of the usual list monad. The relationship between a monad and its applicative is often given as (<*>) = ap in Haskell land. This is equivalent to:

let+ f = f
and+ x = x in
  f x

being equal to:

let* f = f in
let* x = x in
  return (f x)

and clearly does not hold if you make and+ be the zipping operations.

Zipping is useful as a modular way to implement mapN operators, and as such it seems fairly common in lists.

Sure, but that doesn't make zipping a structural part of the list monad. Most useful operations on lists are not the core monadic operations.

(We can build a 1-cyclic list, but I think we could also consider defining zipping so that zipping a one-element list with another length results in implicit extension rather than a failure.)

(That seems pretty dubious to me, why would a list of length 1 be considered infinite but a list of length 2 be considered finite. But either way we're getting off topic here.)

Providing URLs to idiomatic ppx_let would be a useful first step.

a directory of examples using Incremental
an example program using Async
More generally the results of these two searches are useful: let%bind and let%map
An example of using ppx_let with Cmdliner from dune

As a user reading documentation on this feature, here are some questions that I would like to ask: [...] I'll come back to these when I have more time

aantron commented 5 years ago

For the submodule name, I suggest Option.Keywords. It's the most immediately self-explanatory and precise name I've come up with so far.

aantron commented 5 years ago

My preference is to undo the let* and let+ syntax, and to prefer specific decorators on each let or other keyword instead, like proposed in https://github.com/facebook/reason/pull/2140. This would do away with the issue of opening modules to get operators in scope. So, some variation of:

let.Option a = b in
...

etc. We can probably make this prettier somehow.

We could have Option select implementations from module Option, or implicitly from a submodule Option.Keywords, where the .Keywords path component doesn't have to be written by the user.

IMO this is far more "clear" than let*, especially if the implementation of let in Option.Keywords has to be called Option.Keywords.let_. By contrast, the connection between let* and opens requires non-trivial knowledge, as well as more analysis of the code being written.

aantron commented 5 years ago

@keleshev made a similar suggestion in https://github.com/ocaml/ocaml/pull/1947#issuecomment-409332752, from which we came up with decorating by a module (whether a good idea or not).

There are several additional reasons not to rely on opens, most of which are out of my cache. One is that code with open is more fragile to refactoring and copying/pasting.

gasche commented 5 years ago

Hi @aantron,

I hear you, but I think that everyone is wary of rolling back on the sort of consensus on the binding operators that we have managed to gather. Having some sort of first-class support for monadic notations has been discussed for at least 8 years now, and there was never enough support for one of these proposals to make it. This one passed barely, and as you can see in this discussion @lpw25 is still having a rough time. You should realize that rolling back on this one now will not make let.M available at the same time (it doesn't have enough support for this), and is most likely to result in not having any syntactic support in the next release.
Personally I don't strongly believe that let.M is an improvement over let-op, so I don't feel that it is worth the risk; I would prefer to move forward with what we have than have nothing at all.

Why doesn't let.M have more support? A few ideas:

As far as I am aware, the Reason folks never tried to communicate with us (the OCaml upstream) about their language design choices. If they want to coordinate language evolution (or compiler hacking, for that matter), they have to try a little harder than having someone randomly linking a PR once in a while. (I've said exactly that at the last ML workshop and maybe some communication will happen.) In the current state of affairs "Reason is considering doing X" doesn't give X much additional leverage.
let.M has no clear connection between the syntax and the operator that is used under the hood. (In contrast, @keleshev's let.foo proposal does have such a connection; they are quite distinct proposals and should be discussed separately.) This connection explains why some people have a marked preference for let-op and and-op, which is an extension of, say, .%() and .%()<- as user-defined operators.

One nice future-looking aspect of identifier-centric syntaxes (let* and let.foo) is that we can see how they would remain relevant in a world where OCaml has type-classes or a similar type-directed overloading mechanism for identifiers.

yallop commented 5 years ago

I hear you, but I think that everyone is wary of rolling back on the sort of consensus on the binding operators that we have managed to gather. Having some sort of first-class support for monadic notations has been discussed for at least 8 years now, and there was never enough support for one of these proposals to make it.

I would prefer to move forward with what we have than have nothing at all.

I find this line of argument unconvincing. Just because something's been wanted for a while doesn't mean that technical concerns should be pushed aside.

We've seen this situation before, but the case that really sticks in my mind is the proposal for non-recursive type declarations of the form:

type nonrec t = t

As soon as the nonrec proposal was made, @alainfrisch pointed out that it didn't solve the general case:

it would also be useful to refer to a type defined above (especially in an outer module) in the module even if is has been shadowed [...] I guess it's worth thinking about a more general solution.

A little later @garrigue pointed out that the underlying issue was really about local names, not directly about recursion:

This said, I still have a strange feeling about this "nonrec" discussion, because this all boils down to questions of internal vs. external names, and they would be the only way to give a real solution to this problem.

I proposed an alternative syntax:

How about introducing a different binding symbol for simple non-recursive aliases?
 type t
 module M = struct
   type t := t

However, even though nonrec was evidently not the "right thing", it was pushed in anyway. Now, as of last month, we have a much better solution that is harmonious with the rest of the language, solves the general case @alainfrisch described, uses my proposed syntax, and is based on @garrigue's distinction between internal and external names. Unfortunately, we're still stuck with nonrec as well, perhaps for ever.

So if @lpw25's proposal is the right design then it should be defended on technical grounds, not because the alternative is having to wait a little longer.

[Edit: this originally said that nonrec was "rushed-in". I meant that it would have been better, in my view, to wait for a more complete solution to the problem it solved, but "rushed" is a bit misleading, since there was almost two years between the time nonrec was proposed and the time it was merged.]

gasche commented 5 years ago

@yallop: fair points, but I would make a distinction between technical design discussions, such as the ones you are reporting in your post, and some syntax choices that are much more subjective/unprincipled (it's a beautiful and rare thing when concrete syntax has good justifications, my favorite example being match .. with exception.) I'm questioning the design of and* and and+ in this thread, but I haven't seen much objective discussion of various concrete syntax proposals, it's more of a matter of taste -- also important!

aantron commented 5 years ago

@gasche,

I'm not making any of these arguments:

That the syntax proposed by Reason is the right one.
That we should give any weight to the Reason choice. I linked the Reason discussion for the content of the discussion, not for any of its inherent weight merely as a choice suggested in another syntax.

I just think the let* syntax is not a good choice, due to reasons including:

It requires non-local, perhaps non-trivial, analysis of the code by the reader to know which let* is resolved to which open.
It is fragile to refactoring, due to this non-local entanglement between keywords and opens, written elsewhere.
This entanglement is practically necessary for using let*.
It is cryptic and not self-explanatory, compared to some hypothetical better solution.

I don't think these points are subjective, as they are directly related to the (predictable) structure of the thinking processes I (and others) have to undergo as readers and editors of code, also as learners of the language.

I know the module solution has its own drawbacks. However, I would strongly prefer not to have the let* solution, most importantly due to those of its drawbacks that I listed above. So, I agree with @yallop that this (ideally) requires some more design work and insight before it can be considered ready.

Somewhat less "meta," the module annotation is indeed less clear than @keleshev's proposal. However, it is arguably more clear than let*:

It immediately suggests which module the reader should look into. This can become very easy to reason about, if that module contains values like let_ and and_.
It doesn't involve any non-local investigation of the code to find the module.

I see the concern of how this would benefit from modular implicits or other overloading schemes. I agree that the module solution is weak here. However, this is by design. I pretty strongly believe that we should allow explicit syntax even in an OCaml that has overload resolution. The let* solution forces all monadic code using syntactic sugar to rely on opens or future overload resolution.

Perhaps this means that we need to prepare both an explicit and an implicit syntax. In current OCaml, however, it is (IMO) an explicit syntax which is more valuable. An implicit syntax only becomes highly valuable with overload resolution, and right now we are using open effectively as a crutch in the absence of that.

The module proposal (and @keleshev's proposal) are in the explicit direction. I am not claiming that either of these is the right point in the design space.

lpw25 commented 5 years ago

The difficulty around and can be explained as follows. For any binding construct there are essentially two forms: a bind form and a map form. The first allows the body of the construct to contain effects that depend on the bound value, whist the second does not. This extends beyond let to any other binding constructs that we might add operators for in the future (match, if, for, etc.). Each of these has two useful forms, and the syntax needs to allow you to choose the one you want. In the case of functors/applicatives only the map form is available, but even for monads the map form can have much better performance than using the bind form with return.

So in this proposal I'm using * to indicate the bind form and + to indicate the map form. However, for and there is no such distinction. It is not actually a binding construct: it is a combinator that is applied before the value is bound. This means the same combinator is used with both map and bind, so we do not need to mark and with + or * to distinguish the two cases. However, we do need to name the and operator something. Which basically leaves us with three choices:

Pick one of + and * to use with and
Use some other third character for and
Provide both + and * names for and so that users can just use the and that matches their binding construct without thinking about it.

In this PR I've implemented the third option.

Note that this issue is not unique to and. If we add more of these operators for other constructs in the language we are likely to need some other combinators as well. For example, I would expect to need an exception combinator if we wish to allow using exception patterns with the match* operators proposed in #1955.

As to suggestions of other syntax. Most of the options are inferior to the current implementation, as discussed in the issue that implemented it. Fundamentally, we need to write letfoo for some foo and then relate that to some function bar that implements the operation. Some desirable properties are:

We need to be able to indicate via foo whether we want the binding or mapping function
The relationship between foo and bar should be derivable from the syntax
We should allow for the future possibility that different constructs e.g. letfoo and matchfoo might require different bars.
The syntax should naturally extend to the case where we have modular implicits and want to use a single letfoo for all different computation types.

These properties heavily favour having bar being equal to letfoo and using the ordinary scoping rules for obtaining the bar when given letfoo. At that point we are simply left with the choice of what to allow for foo. This is essentially a matter of taster. The previous PR went with operator symbols. Another reasonable option would be allowing /ident.

I prefer the operators because the point of monadic syntax is allow users to treat a particular monad as the ambient monad of their computation. The distinction between binding, mapping and non-monadic lets should be clear but it shouldn't be over-emphasized. I suspect that much of some people's preference for using longer identifiers is an instance of Stroustrup's rule. I also use refer to the experience of computation expressions in F#, where they use operator symbols and where people seem pretty happy with the readability of resulting code.

lpw25 commented 5 years ago

@aantron Why do you think that it harder given:

Foo.Syntax.(
  let* x = a in
  b
)

to relate the let* to Foo.Syntax.( let* ) than given

let.Foo x = a in
b

to relate the let.Foo to Foo.bind (or is it Foo.map)?

In the first we rely only on the standard lexical rules of the language, and the convention that we keep syntactic operators in modules called Syntax. Whereas in the second you rely on some ad-hoc rules about which operator maps to which function.

aantron commented 5 years ago

Mainly because the Foo.Syntax can be quite far-removed from the let* that it affects, in particular when in some "realistic" code a is actually a complex expression that spans many lines. Arguably, it might be better to avoid such a replacement of a, but I would prefer not to weakly impose that on users due to what could be called an insufficiency of the surrounding syntax in the opinion of some :)

So it's not a matter of just the convention of which keyword maps to which function. This comparison between the approaches is not on one axis:

(axis 1) the module is less clear because it maps keyword let to function let_, maybe .Syntax.let_.
(axis 2) the open is less clear because it is far away from the let*.

To maintain clarity about my position, I don't believe the module is the ultimate solution here. I do believe it's important to address (2) in some way if reasonably possible.

Drup commented 5 years ago

The verbosity constraints for that feature is pretty tight. If the let operator is too verbose (and the module qualification is very verbose), it immediately becomes less interesting than just using the bind operator manually. When we migrated the lwt syntax extension from camlp4 (lwt x = ..) to ppx (let%lwt x = ...), several "advanced" users complained about the increase in verbosity and switched back to using >>=.

While I also dislike the slightly obtuse aspect of using operators, let operators from different monads are very rarely mixed in the same context, and a given let operator is used repeatedly in the same scope. That's why most do-like notations make the user pick a monad for a whole scope, not for an individual lets. In that context, having to specify the complete module name for each let binding in a scope is fairly punishing.

dbuenzli commented 5 years ago

Mainly because the Foo.Syntax can be quite far-removed from the let* that it affects, in particular when in some "realistic" code a is actually a complex expression that spans many lines.

That is certainly true note however that the actual expressions bound by the keywords and the expression body will certainly provide strong hints as to which syntax is being used (which is also the reason why I would prefer if return was removed from the proposal).

gasche commented 5 years ago

I would also add that as code-comprehension tools become more common, our editors are now able to give us either the type or the definition of a binding operator, which resolves any potential ambiguity.

Is the M.let* x = foo in bar syntax available? (For indexing operators we have foo.M.%(bar).). If it were, we could easily propose an ocamlformat mode that adds explicit quantification to each binding or indexing operator for code-reading purposes.

alainfrisch commented 5 years ago

to relate the let.Foo to Foo.bind (or is it Foo.map)?

Sorry, I did not follow all discussions around this proposal, but has it been discussed to relate let.Foo to "identifier" (let) in module Foo (assuming the parser interprets (let) as a valid lident, this would be written Foo.(let)).

gasche commented 5 years ago

@alainfrisch how do we apply (let) then, let. x = t in u?

I've been thinking more about and+ vs. and*. Remark that the following code:

let+ x = a in
let+ y = b in
t

has type 'a t t:

# let _let_map li f = List.map f li;;
val _let_map : 'a list -> ('a -> 'b) -> 'b list = <fun>
# _let_map [1; 2] @@ fun x ->
  _let_map [3; 4] @@ fun y ->
  (x, y);;
- : (int * int) list list = [[(1, 3); (1, 4)]; [(2, 3); (2, 4)]]

If we view the use of an and-form instead of the second let+ as collapsing this to an 'a t, then this is exactly a monadic (not functorial or applicative) operation. In particular, I don't expect an equivalence between nested-let+ and and+ to hold in general: for some applicative functors that can define let+ and and+ but are not monads, it will not hold.

Personally I think that and* is a more natural name for such a layer-collapsing cartesian-product operation (it is a "product"). This would free the usage of and+ for a parallel-iteration operation, in the spirit of zipping ("+" for parallelism is also a natural symbolic choice).

I think that the two operators (cartesian vs. parallel product) only differ for "multi-shot effects" like List, not for linear or affine effects like Option or State or Lwt.

hcarty commented 5 years ago

how do we apply (let) then, let. x = t in u?

@gasche Not sure if this was a rhetorical question, but let._ x = t in u could work in that case, picking the in-scope (let).

gasche commented 5 years ago

Still thinking about and* and and+, I think there are three design choices.

Not have a user-definable syntax for and-operators to remove that difference. As Leo points out, this is syntactically awkward (it's doable if let* is a record, but this proposal was rejected as not-the-standard-interface.)
Keep them both the same thing when let+ and let* refer to different operators (map and bind) over the same monad.
Use and* for a cartesian product, and and+ for a parallel product.

For single-shot or at-most-one-shot effects (Option, Lwt, CmdLiner...), I think that there is no difference between (2) and (3). The main difference comes from uses of let+ ... and+ to describe comprehension on sequences -- List, Seq, Array.

Personally I would find (3) more natural, and it is also more expressive. It has the following drawbacks:

The user has to think about which product they want, when dealing with a sequence / multishot effect.
Code could in theory mix the two product operators, and this can get confusing quickly. In particular, it breaks the possibility to reorder bindings in a let .. and block.
I'm not sure how to give a general definition to the notion of "parallel product", and how often it makes sense. (Does it make sense for quickcheck / afl-fuzz, for example?)

To illustrate the second problem, consider

let+ x = inputs
and+ y = outputs
and* c = checks
in c x y

which maps over inputs and outputs in parallel, performing each check for each parallel (i, o) pair, and

let+ x = inputs
and* f = transforms
and+ y = outputs
in (f x = y)

Remark: @lpw25, if I understand correctly the let+ .. and+ syntax (even with parallel-and instead of cartesian-and) doesn't require defining a pure operation: what is typically written pure f <*> a <*> b becomes let+ x = a and y = b in f x y, so the pure is gone, and most use-cases of zipping never actually require to worry about infinite lists.

gasche commented 5 years ago

(cc @yallop and @stedolan)

gasche commented 5 years ago

Note: I tried to understand and+ better from the code examples given by @lpw25, but I think they are not enough for this. let%map .. and .. is rarely used, and I have only found uses for single-shot effects like CmdLiner or Incremental.

Note: the reason why and+ raises questions and and* does not is that product is the central operation for applicatives, not for monads where it is derived in "the canonical way that corresponds to nested let*".

alainfrisch commented 5 years ago

how do we apply (let) then, let. x = t in u?

One could write of course (let) t (fun x -> u) (and then discuss about providing a shorter syntax for simple function literals..). I'm not sure it's worth providing a way to use the "binding syntax" for the current scope. The way to define a custom binder locally, if really needed, would be through modules:

let module M = struct let (let) e f = ... end in
let.M x = ... in ...

or if one wants to use locally in the current module and also export:

module M = struct let (let) ... end
include M

(I suspect that modular implicits would require a similar "approach".)

lpw25 commented 5 years ago

let.M does maintain a one-to-one relationship between the "letfoo" and the "bar" from my discussion above. However, it will require you to write let.Option.Bind and let.Option.Map -- which is extremely verbose. If instead you use let.Bind and let.Map then you are just back to using open and you'd be better off with operators or let/bind-style identifiers.

bluddy commented 5 years ago

Sorry for opening a new can of worms here, but I want to raise this question: does it make sense to make OCaml extra hospitable for monads, when we're looking forward to typed effects, which are mostly isomorphic to monads but cleaner?

gasche commented 5 years ago

Here is a property (maybe even a specification) of the parallel product, in term of commutation with the builtin product of the language:

parallel-prod
  (map m (fun x -> t1))
  (map m (fun x -> t2))
=
map m (fun x -> (t1, t2))

If (and+) was the parallel product, this property would, in particular, guarantee the following equality between bindings

let+ x = m
and+ x = m
in t

=

let+ x = m
in t

Edit: the proposed property can be expressed as: (and+) m m = (let+ x = m in (x, x)). In contrast, we have (and*) m m = (let* x = m in (let* y = m in (x, y)))

rizo commented 5 years ago

One option that I think wasn't mentioned that avoids the need to define both (and*) and (and+) is to define only (and), which would be used by both (let*) and (let+). Unlike (or), (and) is currently not a valid operator, so I think that would work. To call the operator directly the (and) x y syntax could be used. The potentially confusing part is that x and y wouldn't be valid.

lpw25 commented 5 years ago

One option that I think wasn't mentioned that avoids the need to define both (and*) and (and+) is to define only (and)

I think this is probably my preferred option. It's slightly less obvious in:

let* x =
 ...
and y =
  ...
in
...

that the binding of y is special, but it is a better reflection of how and relates to map and bind. It is also how ppx_let looks and it seems to work pretty well there. I'd still like to leave in support for and operators other than and since I have a few weird use cases that require multiple distinct and operators, but adding support for (and) as well is easy to do.

gasche commented 5 years ago

Let me play the adversary and criticize the (and) idea (which does have merits).

It is inconsistent with other user-defined operators, which are all distinct from builtin constructs and keywords: we did not allow rebinding (.()) or (.()<-) or (let) -- rebinding (let) would be a bad idea as it would make it too hard for readers to know that something special is going on. (and) is doable because it must be marked by a non-standard let-class operator, but inconsistent with this global trend.

In a world where different and<op> operators could be used in the same binding, we would have a use for vanilla and, which is to signify a simple binding. Haskell's do-notation has monadic bindings of the form x <- e and standard bindings of the form let x = e; in set comprehensions, it is not uncommon to wish to alternate between x in foo forms and x = bar forms.
That said, I don't see how to extend the existing desugaring for let/and to accomodate vanilla and x = foo bindings that should result in a simple let. (If we know of a return operator we can desugar into and* x = return foo, but our syntax precisely does not assume a return exists.)

lpw25 commented 5 years ago

@gasche Perhaps it would help if I wrote out the various laws that the operators should obey. Apologies for the rubbish markdown formatting (edit from trefis: reformated).

`let+`

		is equal to
Identity	let+ x = v in x	v
Composition	let+ y = let+ x = v in g x in f y	let+ x = v in f (g x)

`and`

		is equal to
Naturality	let+ x = u and y = v in f x, g y	let+ a = let+ x = u in f x and b = let+ y = v in g x in a, b
Left identity	let+ x = return a and y = v in y	v
Right identity	let+ x = u and y = return b in x	u
Associativity	let+ x = u and (y, z) = let+ y = v and z = w in y, z in f x y z	let+ x = u and y = v and z = w in f x y z

`let*`

		is equal to
Left identity	let* x = return a in f x	f a
Right identity	let* x = m in return x	m
Associativity	let* y = let* x = u in f x in g y	let* x = u in let* y = f x in g y
Compatibility with `and`	let+ x = u and y = v in f x y	let* x = u in let* y = v in return (f x y)

lpw25 commented 5 years ago

(and) is doable because it must be marked by a non-standard let-class operator, but inconsistent with this global trend.

True but I think this is basically fine.

Haskell's do-notation has monadic bindings of the form x <- e and standard bindings of the form let x = e; in set comprehensions, it is not uncommon to wish to alternate between x in foo forms and x = bar forms

They do, but they don't allow you to use the different forms in a single binding -- partly because they don't have support for binding multiple things at once. Whilst it is often useful to alternative between let* and let, I don't think it is ever particularly useful to have:

let* x = monad
and y = not_monad in
f x y

because you could always use:

let y = not_monad in
let* x = monad in
f x y

or

let* x = monad
and* y = return not_monad in
f x y

instead. I also think it is less confusing to have all the defining expressions for let* or let+ expression be monadic.

ocaml / ocaml