haskell / rfcs

This repo is archived, consider using https://github.com/ghc-proposals/ghc-proposals instead
98 stars 17 forks source link

Lambda the Ultimate Reserved Word #19

Closed blamario closed 6 years ago

blamario commented 6 years ago

A simple compatibility-breaking change, to set lambda free from the variable identifiers.

goldfirere commented 6 years ago

Rendered

evincarofautumn commented 6 years ago

I think this should only be enabled when UnicodeSyntax is in effect. It’s unfortunate that λ by itself couldn’t be used as a variable anymore (e.g., wavelength, rate, eigenvalue), but I don’t know if there’s anything to be done about that, or how many people would even mind.

blamario commented 6 years ago

I think this should only be enabled when UnicodeSyntax is in effect.

I disagree. That would mean that a syntactically valid program could stop parsing if UnicodeSyntax was turned on. I think the compatibility break is minor enough to be bearable even for people who don't intend to venture outside ASCII.

It’s unfortunate that λ by itself couldn’t be used as a variable anymore (e.g., wavelength, rate, eigenvalue), but I don’t know if there’s anything to be done about that, or how many people would even mind.

Yes, this is probably the most difficult part of the proposal, and it's mentioned as such. My impression is that more people would be offended by disallowing whitespace after lambda-the-operator.

goldfirere commented 6 years ago

I disagree. That would mean that a syntactically valid program could stop parsing if UnicodeSyntax was turned on. I think the compatibility break is minor enough to be bearable even for people who don't intend to venture outside ASCII.

Today, -XUnicodeSyntax can break existing programs. For example, ∀ can be used as a binary type-level operator without -XUnicodeSyntax, but it means forall with -XUnicodeSyntax.

Regardless, I'm against this proposal. It causes a backward-incompatibility and a complication in the specification for marginal gain (in my opinion).

blamario commented 6 years ago

Today, -XUnicodeSyntax can break existing programs. For example, ∀ can be used as a binary type-level operator without -XUnicodeSyntax, but it means forall with -XUnicodeSyntax.

That's interesting, since forall doesn't suffer from the same problem: it's only recognized as a keyword in the context of a type signature. I wonder why the same approach hasn't been extended to its Unicode synonym.

Anyway, the problem exists only because UnicodeSyntax is a GHC language extension. If we're serious about Unicode support in the core language, we'll probably need to remove ∀ and similar reserved symbols from the set of allowed user operators.

Regardless, I'm against this proposal. It causes a backward-incompatibility and a complication in the specification for marginal gain (in my opinion).

I'm happy with ASCII myself, with or without the proposal. I neither teach Haskell nor write Haskell blog articles, so I'm not concerned about the visual appearance of my code. I just think that the various Unicode proposals that have been put forward over the years make no sense without the λ operator, and this is the only way I see to set it free for that use.

Can anybody from the education side speak up?

cartazio commented 6 years ago

I do have to agree that it’d be a shame to prevent use of lambda where it naturally arises in physics / math calculations.

On Mon, Oct 9, 2017 at 8:52 PM Mario notifications@github.com wrote:

Today, -XUnicodeSyntax can break existing programs. For example, ∀ can be used as a binary type-level operator without -XUnicodeSyntax, but it means forall with -XUnicodeSyntax.

That's interesting, since forall doesn't suffer from the same problem: it's only recognized as a keyword in the context of a type signature. I wonder why the same approach hasn't been extended to its Unicode synonym.

Anyway, the problem exists only because UnicodeSyntax is a GHC language extension. If we're serious about Unicode support in the core language, we'll probably need to remove ∀ and similar reserved symbols from the set of allowed user operators.

Regardless, I'm against this proposal. It causes a backward-incompatibility and a complication in the specification for marginal gain (in my opinion).

I'm happy with ASCII myself, with or without the proposal. I neither teach Haskell nor write Haskell blog articles, so I'm not concerned about the visual appearance of my code. I just think that the various Unicode proposals that have been put forward over the years make no sense without the λ operator, and this is the only way I see to set it free for that use.

Can anybody from the education side speak up?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haskell/rfcs/pull/19#issuecomment-335328132, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwnAXiS_BlfSZrCwiivUKUK8N9l1Iks5sqr_OgaJpZM4PynSp .

goldfirere commented 6 years ago

That's interesting, since forall doesn't suffer from the same problem: it's only recognized as a keyword in the context of a type signature. I wonder why the same approach hasn't been extended to its Unicode synonym.

∀ has the same treatment as forall; it's forbidden only in types (with -XUnicodeSyntax).

I do see the value of having \lambda be a reserved word, but I guess I really don't like that it's a reserved letter. I think requiring the space is a reasonable compromise.

However, a question lingers: who is asking for this? You say that you're happy with ASCII, so why make this change?

blamario commented 6 years ago

∀ has the same treatment as forall; it's forbidden only in types (with -XUnicodeSyntax).

It doesn't look that way from here:

{-# LANGUAGE UnicodeSyntax #-}
(∀) :: Int -> Int -> Int
forall :: Int -> Int -> Int
a ∀ b = a + b
forall a b = a + b

results in

Forall.hs:2:1: error:
    Invalid type signature: (forall) :: ...
    Should be of form <variable> :: <type>
   |
 2 | (∀) :: Int -> Int -> Int
   | ^^^

I do see the value of having \lambda be a reserved word, but I guess I really don't like that it's a reserved letter. I think requiring the space is a reasonable compromise.

What you dislike then is the lexical layer complication? It does present a bit of a problem for implementations, but I don't think many users care about the details of the language specification. They would notice if they were forced to put a space after every lambda.

However, a question lingers: who is asking for this? You say that you're happy with ASCII, so why make this change?

Oh, I've seen it come up now and then:

https://ghc.haskell.org/trac/ghc/ticket/1102 https://stackoverflow.com/questions/10465767/default-lambda-symbol-in-emacs-haskell-mode https://www.reddit.com/r/haskell/comments/3irnhs/why_doesnt_language_unicodesyntax_have_%CE%BB/ https://www.reddit.com/r/haskell/comments/mfqho/unicode_for_prettier_code_eg_%CE%BBx_x_1/

The existence of Emacs mode hacks like https://wiki.haskell.org/Emacs/Unicode_symbols also speaks to some demand, I think.

HJvT commented 6 years ago

The Unicode table contains several extra lambdas in the Mathematical Alphanumeric Symbols section, see http://jrgraphix.net/r/Unicode/1D400-1D7FF . One of these could be used to replace the \ in lambda expressions.

blamario commented 6 years ago

The Unicode table contains several extra lambdas in the Mathematical Alphanumeric Symbols section, see http://jrgraphix.net/r/Unicode/1D400-1D7FF. One of these could be used to replace the \ in lambda expressions.

That's a good catch! I swear I had looked at the mathematical symbols looking for an alternative lambda symbol. I can't explain how I missed these. Perhaps I was looking for a lambda, and each of these is a lamda. Oh well.

I agree these are better because they don't require compromising Greek alphabet and don't require as much gymnastics in the lexical layer. Technically speaking, though, they would still require a backward compatibility break because these mathematical symbols are still letters according to Unicode. We'd have to make a special case either for one of these mathematical lowercase lamdas, for all of them, or for the whole block of mathematical symbols.

Just to make sure, I tested and GHC currently treats these as letters, with or without the UnicodeSymbols pragma. That is the correct behaviour according to the Haskell 2010 report. If we want to change anything at all for Haskell 2020, I think the simplest solution would be to consider the entire block of mathematical alphanumeric symbols (1D400–1D7FF) as symbol characters.

cartazio commented 6 years ago

SO at that point, does this become both a bug fix in how we handle unicode and making the symbol class lambda in unicode a bit more useful?

On Wed, Oct 11, 2017 at 8:58 PM Mario notifications@github.com wrote:

The Unicode table contains several extra lambdas in the Mathematical Alphanumeric Symbols section, see http://jrgraphix.net/r/Unicode/1D400-1D7FF. One of these could be used to replace the \ in lambda expressions.

That's a good catch! I swear I had looked at the mathematical symbols looking for an alternative lambda symbol. I can't explain how I missed these. Perhaps I was looking for a lambda, and each of these is a lamda. Oh well.

I agree these are better because they don't require compromising Greek alphabet and don't require as much gymnastics in the lexical layer. Technically speaking, though, they would still require a backward compatibility break because these mathematical symbols are still letters according to Unicode. We'd have to make a special case either for one of these mathematical lowercase lamdas, for all of them, or for the whole block of mathematical symbols.

Just to make sure, I tested and the GHC currently treats these as letters, with or without the UnicodeSymbols pragma. That is the correct behaviour according to the Haskell 2010 report. If we want to change anything at all for Haskell 2020, I think the simplest solution would be to consider the entire block of mathematical alphanumeric symbols (1D400–1D7FF) as symbol characters.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/haskell/rfcs/pull/19#issuecomment-335989847, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwtdapGk15p3nfUtb-gkDdaOVmtzDks5srWQugaJpZM4PynSp .

blamario commented 6 years ago

SO at that point, does this become both a bug fix in how we handle unicode and making the symbol class lambda in unicode a bit more useful?

I wouldn't call it a bug fix. A programming language specification is like a mathematical theory: it can't be correct or incorrect, only more or less useful.

HJvT commented 6 years ago

Technically speaking, though, they would still require a backward compatibility break because these mathematical symbols are still letters according to Unicode. We'd have to make a special case either for one of these mathematical lowercase lamdas, for all of them, or for the whole block of mathematical symbols.

If we don't want to define these letters as special characters, we could use the Unicode character 'LEFT SEMIDIRECT PRODUCT' (U+22CB) ⋋. It is in the Unicode block Mathematical Operators.

See http://www.fileformat.info/info/unicode/char/22cb/index.htm

blamario commented 6 years ago

After some thinking about the options, I've convinced myself that the best solution is to declare the entire 1D400–1D7FF block of mathematical alphanumeric symbols to be single-character identifiers in Haskell. It's a relatively clean rule and easy to explain, and has some potential for use beyond the lambda operator:

I'm sure there are other possibilities that haven't occurred to me. The downsides are a backward compatibility break that is unlikely to affect any existing code and a new production in the lexical layer.

cartazio commented 6 years ago

so what would http://www.fileformat.info/info/unicode/block/mathematical_operators/list.htm that mathematical operators range be in haskell lexing? (today or should be in the future?)

blamario commented 6 years ago

These are different. Their Unicode general category is Math Symbol rather than Cased Letter, and besides they clearly are symbols by any reasonable interpretation. They should be symbols rather than identifiers in Haskell as well.

I guess your real question is whether they should be considered single-character operators, in accordance with the single character identifiers I proposed above. I'm not sure myself. If I was designing a language from scratch, I probably would make them so. The tradition of multi-character lexical symbols really only makes sense in the context of ASCII's constraints. In pen-and-paper (or PDF) mathematics, symbols are practically never put together horizontally.

The combining characters scare me, though, so I can't really say that mathematical symbols are always a single code point wide.

cartazio commented 6 years ago

I think that the unicode definition of graphemes may be pretty tractable for digestion.

How do we handle/discuss multi code point characters / graphemes in the standard presently?

On Mon, Oct 16, 2017 at 8:42 AM Mario notifications@github.com wrote:

These are different. Their Unicode general category is Math Symbol rather than Cased Letter, and besides they clearly are symbols by any reasonable interpretation. They should be symbols rather than identifiers in Haskell as well.

I guess your real question is whether they should be considered single-character operators, in accordance with the single character identifiers I proposed above. I'm not sure myself. If I was designing a language from scratch, I probably would make them so. The tradition of multi-character lexical symbols really only makes sense in the context of ASCII's constraints. In pen-and-paper (or PDF) mathematics, symbols are practically never put together horizontally.

The combining characters scare me, though, so I can't really say that mathematical symbols are always a single code point wide.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/haskell/rfcs/pull/19#issuecomment-336874827, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAQwlwSZVInn5YbR9-8HsYeLFN-CxKKks5ss08vgaJpZM4PynSp .

blamario commented 6 years ago

How do we handle/discuss multi code point characters / graphemes in the standard presently?

We don't. All references to Unicode are in chapter 2.2. of the standard and they add up to the following

Haskell uses the Unicode [2] character set. However, source programs are currently biased toward the ASCII character set used in earlier versions of Haskell. This syntax depends on properties of the Unicode characters as defined by the Unicode consortium. Haskell compilers are expected to make use of new versions of Unicode as they are made available. ... uniWhite → any Unicode character defined as whitespace uniSmall → any Unicode lowercase letter uniLarge → any uppercase or titlecase Unicode letter uniSymbol → any Unicode symbol or punctuation uniDigit → any Unicode decimal digit

There's no mention of combining characters nor of graphemes. If you're looking for a proper treatment of Unicode, I think the most meticulous example today is Swift. Its lexical syntax does accord special treatment to combining characters.

For my taste, Swift actually goes too far in explicitly listing all code point ranges that belong to each relevant category. I think the Haskell report is right in leaving the details of which characters are letters and which are digits up to the implementations to figure out from the Unicode standard.

Since we're opening the Unicode can of worms, though, I have to ask for opinions on what to do about non-cased alphabets? According to the current report, the first letter of every identifier must be either lowercase (for variables), upper case, or title case. That seems to make it impossible to specify an identifier in an alphabet that doesn't distinguish between cases. Is there currently any workaround?

DemiMarie commented 6 years ago

What about making λ just another reserved word?

blamario commented 6 years ago

What about making λ just another reserved word?

That might be an acceptable compromise, but after HJvT's comment I think it would be better to use a proper mathematical symbol for the purpose.

It's been fun to confirm Walder's Law yet again, but I'll have to abandon this pull request. I'm preparing a more thorough treatment of the lexical syntax, which I hope to submit mid-January.