Open CodaFi opened 7 years ago
Some of this also hinges on our language being whitespace agnostic. I talked with Harlan, and we agreed on “whitespace equivalence” being an OK thing to shoot for in theory. In practice, this means we will warn when “equivalent” whitespace is used at the beginnings of lines instead of equal whitespace. I’ve come up with these rules
For a token <t>
, let the leading trivia before <t>
, excluding line comments, and up to the nearest newline character be defined as its (leading) whitespace sequence [ws]
. [ws]
is an array of spaces and tab characters in the order in which the user wrote them in the source file.
Two tokens with whitespace sequences [ws1]<t1>
and [ws2]<t2>
shall be considered to have equivalent (leading) whitespace if [ws1] and [ws2] contain the same number of tabs and spaces in any order.
Two tokens with whitespace sequences [ws1]<t1>
and [ws2]<t2>
shall be considered to have equal (leading) whitespace if [ws1] and [ws2] contain the same number of tabs and spaces in the same order.
For two tokens with whitespace sequences [ws1]<t1>
and [ws2]<t2>
we say “<t2>
is at least as indented as <t1>
” if their respective leading whitespace is equivalent OR an initial prefix the length of [ws1]
in [ws2]
is equivalent to [ws1]
. This ordering is undefined if the equivalent initial sequence cannot be identified.
Would be nice to throw this in a doc in Lithosphere/
We can't accept all of Haskell's rules. I've modified them accordingly:
Let [ws]<t>
notate a generic token with whitespace sequence [ws]
. We write <t>
or literals like 'a'
when this sequence is not needed.
Suppose the whitespace sequence [ws]
contains a newline. We notate this [ws]*
.
The operator Length(*)
acts on a whitespace sequence [ws]
and returns the
sum of the number of tabs and spaces in the sequence.
The shining procedure Shine(*, *)
is defined recursively on a stream of tokens,
ts
, and a layout stack ls
. The layout stack maintains the invariant that, for
(ls : l2 : l1)
, l1
will be strictly more indented than l2
.
Shine [] [] = []
Shine [] (ls : l) = '}' : Shine [] ls
Shine (<t> : []) [] = [t]
Shine ([ws1]*<t> : ts) (ls : [ws2]) = ';' : <t> : (Shine ts (ls : [ws1]*)) | if [ws1] = [ws2]
= '}' : (Shine (<t> : ts) ls) | if [ws1] < [ws2]
= <t> : Shine ts (ls : [ws1]*) | otherwise
Shine ('where' : '{' : [ws1]<t> : ts) (ls : [ws2]) = 'where' : '{' : <t> : Shine ts (ls : [ws]) | if [ws1] > [ws2]
= 'where' : '{' : '}' : Shine (t : ts) (ls : [ws2]) | otherwise
Shine ('where' : [ws]<t> : ts) (ls : l) = 'where' : '{' : <t> : Shine ts (ls : [ws]) | if [ws1] > [ws2]
Shine ('where' : '{' : [ws]<t> : ts) ls = 'where' : '{' : <t> : Shine ts (ls : [ws])
Shine ('where' : [ws]<t> : ts) ls = 'where' : '{' : <t> : Shine ts (ls : [ws])
Shine ('}' : ts) (ls : [ws]) = '}' : Shine ts ls
Shine (<t> : ts) ls = <t> : Shine ts ls
Note that, by these rules, we break what would usually be whitespace ambiguities like
-- [t] = tab
-- [s] = space
module Foo
[s][s]module Bar
[s][s][t]module Baz
[s][t][s]module Bat
[s][s][s][t]module Quux
as
-- [t] = tab
-- [s] = space
module Foo {
[s][s]module Bar {
[s][s][t]module Baz {}
[s][t][s]module Bat {}
[s][s]}
[s][s][s][t]module Quux {}
}
Note that this may be counterintuitive, so when the rule that causes the last line to wind up in the top-level layout block instead of in Bat's layout block fires, we will diagnose this with a warning.
Still a bit broken.
module DataDecl where
data Foo : Set where
| Foo : nat -> nat -> Foo
| Bar : nat -> Foo
The shined output is:
module DataDecl where {
data Foo : Set where { };
| Foo : nat -> nat -> Foo;
| Bar : nat -> Foo;
};
Unless those are supposed to be indented, at which point never mind
Actually, the diagnostic for this suuuuucks so bad.
invalid-data-decl.silt:4:0: error: expected declaration
| Foo : nat -> nat -> Foo;
^
Anything under a where clause should be strictly more indented than the enclosing context unless you are the top-level module.
Yes, and I think we could recognize a large set of mistakes here with diagnostics.
Absolutely. Whitespace ambiguities and mismatches should be diagnosed in Shine. All the infrastructure is there to do it too...
Serves me right for thinking I could do better than the Haskell report. The code, as it stands, is insufficient to detect at least these conditions
=>
=>