CodaFi commented 7 years ago

Serves me right for thinking I could do better than the Haskell report. The code, as it stands, is insufficient to detect at least these conditions

Empty data declarations

data Bottom : Type where
data Top : Type where
  | TT : Top

=>

data Bottom : Type where {};
data Top : Type where {
  | TT : Top;
};

Nested module contexts

module Foo where
  module Bar where
  module Baz where

=>

module Foo where {
  module Bar where {};
  module Baz where {};
};

CodaFi commented 7 years ago

Some of this also hinges on our language being whitespace agnostic. I talked with Harlan, and we agreed on “whitespace equivalence” being an OK thing to shoot for in theory. In practice, this means we will warn when “equivalent” whitespace is used at the beginnings of lines instead of equal whitespace. I’ve come up with these rules

Whitespace Sequences

For a token <t>, let the leading trivia before <t>, excluding line comments, and up to the nearest newline character be defined as its (leading) whitespace sequence [ws]. [ws] is an array of spaces and tab characters in the order in which the user wrote them in the source file.

Whitespace Equivalence

Two tokens with whitespace sequences [ws1]<t1> and [ws2]<t2> shall be considered to have equivalent (leading) whitespace if [ws1] and [ws2] contain the same number of tabs and spaces in any order.

Whitespace Equality

Two tokens with whitespace sequences [ws1]<t1> and [ws2]<t2> shall be considered to have equal (leading) whitespace if [ws1] and [ws2] contain the same number of tabs and spaces in the same order.

Whitespace Ordering

For two tokens with whitespace sequences [ws1]<t1> and [ws2]<t2> we say “<t2> is at least as indented as <t1>” if their respective leading whitespace is equivalent OR an initial prefix the length of [ws1] in [ws2] is equivalent to [ws1]. This ordering is undefined if the equivalent initial sequence cannot be identified.

harlanhaskins commented 7 years ago

Would be nice to throw this in a doc in Lithosphere/

CodaFi commented 7 years ago

We can't accept all of Haskell's rules. I've modified them accordingly:

Notation

Let [ws]<t> notate a generic token with whitespace sequence [ws]. We write <t> or literals like 'a' when this sequence is not needed.

Suppose the whitespace sequence [ws] contains a newline. We notate this [ws]*.

The operator Length(*) acts on a whitespace sequence [ws] and returns the sum of the number of tabs and spaces in the sequence.

The shining procedure Shine(*, *) is defined recursively on a stream of tokens, ts, and a layout stack ls. The layout stack maintains the invariant that, for (ls : l2 : l1), l1 will be strictly more indented than l2.

The Rules of Layout (The Shining)

Shine [] [] = []
Shine [] (ls : l) = '}'  :  Shine [] ls
Shine (<t> : []) [] = [t]

Shine ([ws1]*<t> : ts) (ls : [ws2]) = ';' : <t>  :  (Shine ts (ls : [ws1]*))   | if [ws1] = [ws2]
                                    = '}'  :  (Shine (<t> : ts) ls) | if [ws1] < [ws2]
                                    = <t> : Shine ts (ls : [ws1]*) | otherwise
  
Shine ('where' : '{' : [ws1]<t> : ts) (ls : [ws2]) = 'where' : '{'  :  <t> : Shine ts (ls : [ws])      | if [ws1] > [ws2]
                                                   = 'where' : '{' : '}' : Shine (t : ts) (ls : [ws2]) | otherwise                             
Shine ('where' : [ws]<t> : ts) (ls : l) = 'where' : '{' : <t> : Shine ts (ls : [ws]) | if [ws1] > [ws2]

Shine ('where' : '{' : [ws]<t> : ts) ls = 'where' : '{'  :  <t> : Shine ts (ls : [ws])
Shine ('where' : [ws]<t> : ts) ls = 'where' : '{' : <t> : Shine ts (ls : [ws])  

Shine ('}' : ts) (ls : [ws]) = '}' : Shine ts ls
 
Shine (<t> : ts) ls = <t> : Shine ts ls

CodaFi commented 7 years ago

Note that, by these rules, we break what would usually be whitespace ambiguities like

-- [t] = tab
-- [s] = space

module Foo
[s][s]module Bar
[s][s][t]module Baz
[s][t][s]module Bat
[s][s][s][t]module Quux

as

-- [t] = tab
-- [s] = space

module Foo {
[s][s]module Bar {
[s][s][t]module Baz {}
[s][t][s]module Bat {}
[s][s]}
[s][s][s][t]module Quux {}
}

CodaFi commented 7 years ago

Note that this may be counterintuitive, so when the rule that causes the last line to wind up in the top-level layout block instead of in Bat's layout block fires, we will diagnose this with a warning.

harlanhaskins commented 7 years ago

Still a bit broken.

module DataDecl where

data Foo : Set where
| Foo : nat -> nat -> Foo
| Bar : nat -> Foo

The shined output is:

module DataDecl where {

data Foo : Set where { };
| Foo : nat -> nat -> Foo;
| Bar : nat -> Foo;
};

harlanhaskins commented 7 years ago

Unless those are supposed to be indented, at which point never mind

harlanhaskins commented 7 years ago

Actually, the diagnostic for this suuuuucks so bad.

invalid-data-decl.silt:4:0: error: expected declaration
| Foo : nat -> nat -> Foo;
^

CodaFi commented 7 years ago

Anything under a where clause should be strictly more indented than the enclosing context unless you are the top-level module.

harlanhaskins commented 7 years ago

Yes, and I think we could recognize a large set of mistakes here with diagnostics.

CodaFi commented 7 years ago

Absolutely. Whitespace ambiguities and mismatches should be diagnosed in Shine. All the infrastructure is there to do it too...

silt-lang / silt

Shine is Utterly Broken #25

Whitespace Sequences

Whitespace Equivalence

Whitespace Equality

Whitespace Ordering

Notation

The Rules of Layout (The Shining)