Closed natefaubion closed 5 years ago
@int-index any thoughts on how to fix this?
The manual is explicit on this:
A drawback of the current implementation is that it does not support type signatures for the parameterized productions, that depend on the types of the parameters. We plan to implement that in the future---the current workaround is to omit the type signatures for such rules.
https://www.haskell.org/happy/doc/html/sec-grammar.html#sec-param-prods
So, "wontfix" I guess, unless someone wants to implement type signatures for parametrized productions?
It is a coincidence that wildcards in this position used to work before my fix. Try on an older version of happy without --coerce
, it should not work either. And --coerce
is supposed to be purely an optimization.
As a proof that happy-1.19.9
treatment of wildcards is unsound, consider this example:
a :: { Bool }
: n { $1 }
n :: { _ }
: { (0::Int) }
Full code:
{
module Main where
import System.IO
}
%name calc
%tokentype { Token }
%token tok { Token }
%monad { IO } { (>>=) } { return }
%%
a :: { Bool }
: n { $1 }
n :: { _ }
: { (0::Int) }
{
main = calc [] >>= print
data Token = Token
lexer :: String -> [Token]
lexer _ = []
happyError tokens = ioError (userError "parse error")
}
The line of interest is n :: { _ }
.
If you try without --coerce
, it is rightfully rejected:
happy-test$ happy Parser.y
unused terminals: 1
happy-test$ runghc Parser.hs
Parser.hs:16:25: error:
Wildcard ‘_’ not allowed
in the definition of data constructor ‘HappyAbsSyn5’
|
16 | | HappyAbsSyn5 (_)
|
If you try with --coerce
, it coerces an 0::Int
to False::Bool
:
happy-test$ happy Parser.y -gc
unused terminals: 1
happy-test$ runghc Parser.hs
False
Conclusion: wildcards were never a supported feature, with parametrized productions or without them. They don't work without --coerce
and they are unsound with --coerce
.
the current workaround is to omit the type signatures for such rules.
Maybe it's just me, but I take recommendations like that with a grain of salt. Happy is an old project, and I don't know if that was just written before Haskell supported wildcards. The only reason I'm using wildcards is because it's a workaround I've seen elsewhere to get compile times to be manageable. It's a little unfortunate, because it means that parameterized productions aren't very useful to a production parser. Compile times are frankly impractical without type signatures.
What would it take to implement signatures for parameterized productions? Is there any previous design work? The most immediately obvious way to me would be to accept the type as an argument and allow it to be templated into the signature.
FWIW, both stack and cabal call happy with --coerce
as the default, so for all intents and purposes, it's the recommended way to use it.
I don't know if that was just written before Haskell supported wildcards
It was written in 2008, partial type signatures were implemented in 2015. So, yeah, I understand why one might be skeptical of this advice. It's quite unfortunate that omitting a signature is still the only viable workaround, and that it leads to unbearable compile times.
FWIW, both stack and cabal call happy with
--coerce
as the default, so for all intents and purposes, it's the recommended way to use it.
Right, it is recommended because it makes code run faster. It's a good optimization. However, for a long time it was unsound, a mistake in a grammar could let one coerce any type to any other type (#137). So the only way to know whether the grammar was correct was to compile it without --coerce
, check for compile time errors, and then compile with --coerce
to get faster code. No one did that, of course, because (as far as I know) no one was aware of --coerce
unsoundness until I discovered and fixed it in #134.
I didn't do it just to break other people's code, I did it because --coerce
didn't work with higher rank productions, which the manual claims happy
supports, and I needed this feature.
So, the way to think about --coerce
is that it's an optimization. When you turn this flag on or off, the only observable difference should be the performance of your parser, not whether your grammar is accepted.
That's why I consider this ticket a feature request, not a bug report: happy
never supported partial type signatures, and the fact that they were accepted with --coerce
was a fluke. And they could be used to write code that casts Int
to Bool
, or segfaults.
What would it take to implement signatures for parameterized productions? Is there any previous design work? The most immediately obvious way to me would be to accept the type as an argument and allow it to be templated into the signature.
This plan sounds reasonable to me, but it's better to ask @yav about it who implemented parametrized productions in the first place.
In the current version of Happy you can write type signatures on parameterized productions---we should just document it! I am not sure who implemented it, but I discovered this some time ago when I started implementing them, and found that they were already there :-)
@yav Care to give me the gist? 😄 I'm aware that you can add a type signature but I didn't see an obvious way to instantiate it for different return types.
Here are some examples from a parser I wrote:
ListOf(thing) :: { [thing] }
: { [] }
| ListOf1(thing) { $1 }
ListOf1(thing) :: { [thing] }
: ListOf1_rev(thing) { reverse $1 }
ListOf1_rev(thing) :: { [thing] }
: thing { [$1] }
| ListOf1_rev(thing) thing { $2 : $1 }
SepBy1(sep,thing) :: { [thing] }
: SepBy1_rev(sep,thing) { reverse $1 }
SepBy1_rev(sep,thing) :: { [thing] }
: thing { [$1] }
| SepBy1_rev(sep,thing) sep thing { $3 : $1 }
@yav Ah! So does it copy the declared type for whatever production you pass it?
I would also be interested to know if this is actually implemented, or just the --coerce
bug again. That is, there's a type variable thing
that is unsafely coerced away, similar to the partial type signature.
When it expands the parameterized rules it substitutes the type of the production. So if I write ListOf(A)
, where A
has type signature T
, you'd get rules that are something like this:
ListOf_A :: { [T] }
: { [] }
| ListOf1_rev_A { $1 }
Alright, I'll give it a shot.
We've updated the PureScript parser to use typed parameterized productions, so this is no longer an issue for us.
The PureScript parser uses parameterized productions, and to aid type-checking, we use type wildcards in their signatures (eg
manySep(a, b) :: { [_] }
). We don't actually have any polymorphic signatures, but the changes introduced by #134 break our parser by introducing syntactically invalid newtype declarations. Unfortunately, removing the signatures from the parameterized productions altogether makes compilation times skyrocket. It used to only take a few seconds to compile. I'm actually typing this up as it's trying to compile without signatures, and it's been nearly 10 minutes and still running. I think the only viable workaround (if we want to use something other than 1.19.9) will be to manually expand all of the productions.