RFC: Record syntax - Githubissues

ziman commented 4 years ago

Given that the hard work has (mostly) been done by @Russoul in #607, maybe we could bikeshed a bit about some record syntax.

Here's my proposal:

Add .{ as a new lexeme, which would parse in the same place in the grammar as .(, and would make the proposed record syntax bind tighter than function application.

Record construction

Person.{name = "Joe", age = 42} stands for record Person {name = "Joe", age = 42}.
(implied constructor) .{name = "Joe", age = 42} stands for record _ {name = "Joe", age = 42}.
(field puns) Person.{name, age} stands for Person.{name=name, age=age}.

Record patterns

getAge Person.{name=name, age=age} = age
can be shortened as getAge .{name, age} = age using the implied constructor + field puns
perhaps we could also leave out some fields? getAge .{age} = age would stand for getAge .{name=_, age} = age

Remarks:

Since dots bind tighter than application, we naturally do not have to parenthesise the record pattern.
Record patterns may become non-trivial if we need to fill in implicits. I hope it should be doable but it might turn out a bit tricky.
Given that we have dotted field access (x.field), record patterns may actually not be used frequently.

Record field access

Stays the same.

print (person.age + length person.name)

Unresolved ideas

Record updates

It would be nice to have something like the following:

person.{age = 43} stands for record {age = 43} person
person.{age $= (+1)} stands for record {age $= (+1)} person
pinocchio.{head.nose.size $= (+42)} stands for record {head.nose.size $= (+42)} pinocchio

However, the Person.{name = value} constructor/pattern syntax will clash with this, unless we require the former to start with exactly one capitalised identifier. This may not be a good idea. But perhaps it would be consistent with other language features?

If we can't resolve the clash, I would leave the record update sugar for later, since the constructor/pattern syntax would benefit from this style much more than the update syntax would, I think. Besides, point-free record updates do not have a clear syntax yet, either.

Point-free record updates

Currently, we use record {field = value} for this. I don't have a better suggestion.

Syntax sugar

{|name = "Joe", age = 42|} could stand for .{name = "Joe", age = 42} in record expressions and record patterns.
_.{...} could be an alternative to .{...} if the latter turns out to clash with forced patterns

ziman commented 4 years ago

I suppose this would make the distinction between implicit and explicit record fields irrelevant, too?

Russoul commented 4 years ago

I like it, that _.{...} though... A bit heavy for me. Introducing sugar just for this is also not ideal. Can we just prefix the braces with some other symbol ? Like .{} or ${} for this pattern ?

But this is sugar too, isn’t it

Russoul commented 4 years ago

I suppose this would make the distinction between implicit and explicit record fields irrelevant, too?

This is another topic we should discuss. Currently only explicit fields are allowed to go into the braces (and checked to be fully specified by user). We can relax this restriction: demand explicit fields to be present fully and optionally allow users to specify implicit fields.

Currently we have: record _ {expField = ...} {impField = ...} @{autoImp}.

Could have: record _ {expField = ..., impField = ...}.

But what about auto implicits then (which can be unnamed) ?

This new syntax is eventually elaborated to applications so there is no real difference, just a matter of preference

ziman commented 4 years ago

I thought you should be able to fill in any number of fields (explicit or implicit, doesn't matter), and the rest would be filled in with _. Pretty much the same as implicit fields are filled with _ if you don't specify them. In other words, record _ would behave as if all arguments were implicit: _ unless given explicitly by name.

Since all fields are named, I do not see the reason to make a distinction between implicits or explicits because that distinction is useful only for unnamed positional arguments. So it would be relevant for MkPerson, which is a function that takes positional arguments, but not for record Person {...} or Person.{...}, which has only named arguments.

I find .{} fine. I'd like to avoid yet another symbol ($), given that . is (would be) used so much around records. However, I'm not sure if the .{...} syntax would clash with dotted patterns (forced patterns); I'm not sure if they have actual surface syntax at all.

ziman commented 4 years ago

I updated the proposal with the .{...} syntax instead of _.{...}.

Russoul commented 4 years ago

interface Constraint a => Interface a where
    field : a

With the PR the above snippet more or less elaborates to

data Interface : Type -> Type where
    MachineName : {0 a : Type} -> {auto 1 _ : Constraint a} -> (field : a) -> Interface a
...

So we do have unnamed positional arguments, because unnamed auto implicits are positional

Russoul commented 4 years ago

I should have pointed out that earlier: internally both records and interfaces elaborate to data declarations and extra top level definitions (for fields/methods). So new record syntax does not apply only to records, it works with any of those (data, record, interface), if there is exactly one constructor(always true for records and interfaces) and all its explicit arguments are not machine named (always true for records and interfaces). This is not new, this typechecks on current master branch:

data Data : Type -> Type where
   MkData : (x : a) -> Data a

map : (a -> b) -> Data a -> Data b
map f = record {x $= f}

ziman commented 4 years ago

Right, you'll probably want to leave these arguments filled by proof search most of the time, I guess, so the inability to mention them and give them an explicit value is not a problem most of the time.

If you really want to, maybe you could say MachineName.{a = Int, theConstraint, field = foo}? The second field is not named, so it would fill the first yet unfilled positional argument (i.e. the constraint). Does this sound realistic?

--

More generally: currently, if you want to give an explicit value of the constraint to MachineName, you have to say MachineName @{theConstraint} theField. Also, currently, (non-auto) implicits are never positional; they can only be named. Which leaves us with the strange case where an implicit is neither named nor auto:

f : {_ : Bool} -> Bool
f = %search

I don't know how to call f — {...} does not work because the argument is not named, @{...} does not work because it's not an auto implicit. Is this corner case covered by the current syntax at all? Should we cover it by the record syntax?

--

I think that the record syntax should be convenient for the intended use case: records. If it's applicable to other things, great, but I would not compromise the ease of use of records to make non-record use cases easier. The non-intended/non-record use cases can always be done using constructors the same way as always.

Russoul commented 4 years ago

Yes, I don't know of a way to provide unnamed implicits. Would be nice to have, but it is definitely out of scope of this PR. Thats why I think its better to only have explicit fields in record _ {...}, so we don't have to deal with this. And we don't sacrifice expressiveness. We still can provide implicits and auto implicits after record _ {...}

Russoul commented 4 years ago

Both in rhs and patterns

ziman commented 4 years ago

Yes, given that it's not clear how this should be done, I think your conservative suggestion (explicits only) is the way to go:

Great step forward.
Convenient for the intended use case.
Conservative so it can be extended if/when we figure it out. Those who think it's easy to add implicits in the mix are free to prove it after this PR is merged. ;)
Conservative so this patch (and the associated debate) does not take forever.
Still provides a way to fill in implicits and auto implicits.

Russoul commented 4 years ago

Adding both auto implicits and implicits to the mix is actually trivial, as we just transform the record syntax into application, which does not expect its implicit arguments to be positional, thus we simply pass them as is.

And that corner case when an implicit is neither named nor auto, we just leave it to the unifier.

I'm not sure if record Interface {a = Int, theConstraint, field = foo} is 'better' than current syntax for the following reason: it's impossible to tell what a and foo are from this expression alone. Which of them are explicit, which are implicit. Besides seeing theConstraint in 'explicit position' is awkward.

ziman commented 4 years ago

Yes, the unnamed argument is quite awkward. Maybe we could say that the record syntax does not support unnamed arguments and if you want them to be anything else than _, you have to use the record constructor manually. This is in line with the approach of conservativity and could be extended later.

Russoul commented 4 years ago

But then we lose expressiveness: we wouldn’t be able to instantiate interfaces without user defined constructor with unnamed constraints in cases when those constraints have to be provided manually. Which is a bit sad.

Possible contrived use case:

interface Show a => ShowPretty a where
    — No user defined constructor
    ...

showPretty : (mySpecialShow : Show String) -> ShowPretty String
showPretty = record ShowPretty {...} — can’t pass mySpecialShow or instantiate manually by constructor, because the later has machine generated name

ziman commented 4 years ago

Hmm, then we have to decide whether we want to be conservative or not. We already can't do what you're proposing with the means we have now (unless you use Ohad's trick, which will be available after adding the record syntax, too), so we're not even losing anything.

On the contrary, from what you wrote before, I thought that showPretty could be defined as record ShowPretty {...} @{mySpecialShow}. Is that not possible? That's already strictly more expressive than the current situation.

Otherwise, we have to decide how to do unnamed arguments. My suggestion with {..., theConstraint, ...} may look awkward but it's pretty much exactly what Python does in this situation, syntactically. (Except that Python requires all positional arguments to appear before any named arguments.) So I'd argue that if that syntax is universally accepted and used by Python programmers, perhaps it could work for us, too?

Of course, other options are available, such as record ShowPretty {foo = bar, @{theConstraint}, moo = baz} — and someone could likely come up with a suggestion better than what I've proposed so far.

Russoul commented 4 years ago

On the contrary, from what you wrote before, I thought that showPretty could be defined as record ShowPretty {...} @{mySpecialShow}. Is that not possible?

It is up to us to decide. We can force all arguments to go inside the braces of the syntax. Or allow providing extra implicits after the syntax record _ {named_exp = ..., named_imp = ..., named_auto = ...} {extra_imp = ...} @{extra_auto}.

ziman commented 4 years ago

Since this is a RFC on the desired syntax of record expressions eventually, I'll say I'd like to have everything inside the braces. (PR #607 is a different story; you may want to be conservative there, keep its scope limited, and tbh, I'd be really happy with any of the proposed forms of #607. We can add sugar later once we know how to do it or that it's a good idea at all.)

The expression Person.{...}, as proposed, is atomic in function applications and pattern applications, and I would find it strange to write

getAge (Person.{name=x, age=y} {type=z}) foo bar

if I can write

getAge Person.{name=x, age=y, type=z} foo bar

We already have syntactic noise caused by a pair of delimiters (braces), yet you'd have to layer yet another pair of delimiters (parentheses) around them. It is not only visual noise; it also affects how you'd lay out multi-line record sub-expressions within larger expressions.
I would find it confusing that we have to write some names between the first pair of braces, while some other names between other pairs of braces. It's not clear which goes where, unless you go look up the exact definition.
I do not care which arguments are implicit or explicit when using the record syntax. I can't see any reason to make the distinction between implicits and explicits in the record syntax, just because the positional syntax makes it. Are there examples where making it would make things better, easier or clearer for people to read or write code?

If positional arguments within braces are confusing, we can simply not support them (at least until we know better). If you have a record-like type with unnamed fields like this:

data T : Type where
  MkT : Bool -> Bool -> T

then sure, it can stay being elaborated as a record under the hood, you just can't use the T.{...} syntax with it. That's fine because if you declared MkT witth unnamed fields, you probably weren't going to use it as a record, and you're probably going to use MkT explicitly. If you decide you want to use the T.{...} syntax, you have to name the fields (and/or ideally declare the whole type T as a record). Or give them a type that makes them inferrable even if left unspecified, either by unification or proof search. I think that's a fair requirement.

ohad commented 4 years ago

Another proposal that came up that is missing from this discussion is using a new syntax for records {| .... |}. This allows us to introduce two lexemes {| and |} and start a-fresh, for example:

{| ... |} for inferred records
Person.{| ... |} for explicitly disambiguating the record type constructor.

ohad commented 4 years ago

Re. unnamed implicit arguments, we could appropriate the @{...} notation, e.g.: _.{@{instance} method1 = 1; method2 = 2}

gallais commented 3 years ago

This was closed by #607

idris-lang / Idris2

RFC: Record syntax #626

Record construction

Record patterns

Record field access

Unresolved ideas

Record updates

Point-free record updates

Syntax sugar