haskell / happy

The Happy parser generator for Haskell
Other
276 stars 84 forks source link

`HappyWrap`s break parameterized productions with wildcard signatures #141

Closed natefaubion closed 5 years ago

natefaubion commented 5 years ago

The PureScript parser uses parameterized productions, and to aid type-checking, we use type wildcards in their signatures (eg manySep(a, b) :: { [_] }). We don't actually have any polymorphic signatures, but the changes introduced by #134 break our parser by introducing syntactically invalid newtype declarations. Unfortunately, removing the signatures from the parameterized productions altogether makes compilation times skyrocket. It used to only take a few seconds to compile. I'm actually typing this up as it's trying to compile without signatures, and it's been nearly 10 minutes and still running. I think the only viable workaround (if we want to use something other than 1.19.9) will be to manually expand all of the productions.

simonmar commented 5 years ago

@int-index any thoughts on how to fix this?

int-index commented 5 years ago

The manual is explicit on this:

A drawback of the current implementation is that it does not support type signatures for the parameterized productions, that depend on the types of the parameters. We plan to implement that in the future---the current workaround is to omit the type signatures for such rules.

https://www.haskell.org/happy/doc/html/sec-grammar.html#sec-param-prods

So, "wontfix" I guess, unless someone wants to implement type signatures for parametrized productions?

int-index commented 5 years ago

It is a coincidence that wildcards in this position used to work before my fix. Try on an older version of happy without --coerce, it should not work either. And --coerce is supposed to be purely an optimization.

int-index commented 5 years ago

As a proof that happy-1.19.9 treatment of wildcards is unsound, consider this example:

a :: { Bool }
   : n { $1 }

n :: { _ }
  : { (0::Int) }

Full code:

{
module Main where

import System.IO
}

%name calc
%tokentype { Token }

%token tok { Token }

%monad { IO } { (>>=) } { return }

%%

a :: { Bool }
   : n { $1 }

n :: { _ }
  : { (0::Int) }

{
main = calc [] >>= print

data Token = Token

lexer :: String -> [Token]
lexer _ = []

happyError tokens = ioError (userError "parse error")
}

The line of interest is n :: { _ }.

  1. If you try without --coerce, it is rightfully rejected:

    happy-test$ happy Parser.y    
    unused terminals: 1
    happy-test$ runghc Parser.hs 
    
    Parser.hs:16:25: error:
        Wildcard ‘_’ not allowed
          in the definition of data constructor ‘HappyAbsSyn5’
       |
    16 |         | HappyAbsSyn5 (_)
       |    
  2. If you try with --coerce, it coerces an 0::Int to False::Bool:

    happy-test$ happy Parser.y -gc
    unused terminals: 1
    happy-test$ runghc Parser.hs  
    False

Conclusion: wildcards were never a supported feature, with parametrized productions or without them. They don't work without --coerce and they are unsound with --coerce.

natefaubion commented 5 years ago

the current workaround is to omit the type signatures for such rules.

Maybe it's just me, but I take recommendations like that with a grain of salt. Happy is an old project, and I don't know if that was just written before Haskell supported wildcards. The only reason I'm using wildcards is because it's a workaround I've seen elsewhere to get compile times to be manageable. It's a little unfortunate, because it means that parameterized productions aren't very useful to a production parser. Compile times are frankly impractical without type signatures.

What would it take to implement signatures for parameterized productions? Is there any previous design work? The most immediately obvious way to me would be to accept the type as an argument and allow it to be templated into the signature.

FWIW, both stack and cabal call happy with --coerce as the default, so for all intents and purposes, it's the recommended way to use it.

int-index commented 5 years ago

I don't know if that was just written before Haskell supported wildcards

It was written in 2008, partial type signatures were implemented in 2015. So, yeah, I understand why one might be skeptical of this advice. It's quite unfortunate that omitting a signature is still the only viable workaround, and that it leads to unbearable compile times.

FWIW, both stack and cabal call happy with --coerce as the default, so for all intents and purposes, it's the recommended way to use it.

Right, it is recommended because it makes code run faster. It's a good optimization. However, for a long time it was unsound, a mistake in a grammar could let one coerce any type to any other type (#137). So the only way to know whether the grammar was correct was to compile it without --coerce, check for compile time errors, and then compile with --coerce to get faster code. No one did that, of course, because (as far as I know) no one was aware of --coerce unsoundness until I discovered and fixed it in #134.

I didn't do it just to break other people's code, I did it because --coerce didn't work with higher rank productions, which the manual claims happy supports, and I needed this feature.

So, the way to think about --coerce is that it's an optimization. When you turn this flag on or off, the only observable difference should be the performance of your parser, not whether your grammar is accepted.

That's why I consider this ticket a feature request, not a bug report: happy never supported partial type signatures, and the fact that they were accepted with --coerce was a fluke. And they could be used to write code that casts Int to Bool, or segfaults.

What would it take to implement signatures for parameterized productions? Is there any previous design work? The most immediately obvious way to me would be to accept the type as an argument and allow it to be templated into the signature.

This plan sounds reasonable to me, but it's better to ask @yav about it who implemented parametrized productions in the first place.

yav commented 5 years ago

In the current version of Happy you can write type signatures on parameterized productions---we should just document it! I am not sure who implemented it, but I discovered this some time ago when I started implementing them, and found that they were already there :-)

natefaubion commented 5 years ago

@yav Care to give me the gist? 😄 I'm aware that you can add a type signature but I didn't see an obvious way to instantiate it for different return types.

yav commented 5 years ago

Here are some examples from a parser I wrote:

ListOf(thing) :: { [thing] }
  :                 { [] }
  | ListOf1(thing)  { $1 }

ListOf1(thing) :: { [thing] }
  : ListOf1_rev(thing) { reverse $1 }

ListOf1_rev(thing) :: { [thing] }
  : thing                           { [$1] }
  | ListOf1_rev(thing) thing        { $2 : $1 }

SepBy1(sep,thing) :: { [thing] }
  : SepBy1_rev(sep,thing) { reverse $1 }

SepBy1_rev(sep,thing) :: { [thing] }
  : thing                           { [$1] }
  | SepBy1_rev(sep,thing) sep thing { $3 : $1 }
natefaubion commented 5 years ago

@yav Ah! So does it copy the declared type for whatever production you pass it?

natefaubion commented 5 years ago

I would also be interested to know if this is actually implemented, or just the --coerce bug again. That is, there's a type variable thing that is unsafely coerced away, similar to the partial type signature.

yav commented 5 years ago

When it expands the parameterized rules it substitutes the type of the production. So if I write ListOf(A), where A has type signature T, you'd get rules that are something like this:

ListOf_A :: { [T] }
  :                { [] }
  | ListOf1_rev_A  { $1 }
natefaubion commented 5 years ago

Alright, I'll give it a shot.

natefaubion commented 5 years ago

We've updated the PureScript parser to use typed parameterized productions, so this is no longer an issue for us.