haskell / happy

The Happy parser generator for Haskell
Other
273 stars 85 forks source link

Happy-generated code fails with `Internal Happy error` with `-XStrictData` #273

Open BjoernLoetters opened 4 months ago

BjoernLoetters commented 4 months ago

I am new to Happy and want to use it for an (old school) introductory course on compiler engineering. Because I repetitively got the very uninformative Internal Happy error error, I tested my whole setup with a minimal example. Surprisingly, it still fails with Internal Happy error.

Here is my setup:

module Main where

import Token
import Parser

main :: IO ()
main = do
  result <- parse [ VAR ]  
  print result -- should print 0
  return ()
module Token where

data Token = VAR deriving (Show, Eq)
{
module Parser where 

import System.IO
import Token
}

%name parse
%tokentype { Token }
%error { parseError }
%monad { IO } { >>= } { return }

%token
  VAR     { VAR }

%%

program : VAR { 0 }

{
parseError :: [Token] -> IO a
parseError _ = fail "something went wrong" -- I never see this error message ...
}

After diving a bit into the code I saw that parse invokes happyParse start_state = happyNewToken start_state notHappyAtAll notHappyAtAll with start_state = 0#. It seems like that somewhere deep inside the machinery the second notHappyAtAll is actually evaluated by Haskell, since it is accessed as a HappyStk here:

happyFail explist 0# tk old_st _ stk@(x `HappyStk` _) =
     let i = (case Happy_GHC_Exts.unsafeCoerce# x of { (Happy_GHC_Exts.I# (i)) -> i }) in
--      trace "failing" $ 
        happyError_ explist i tk

As far as I understand, this means that Happy fails with an empty stack - probably right before even reading the first token.

Here is some information about my environment:

My build.cabal:

cabal-version:      3.0
build-type:         Simple

...

executable HopefullyHappySoon:
  hs-source-dirs:     src
  main-is:            Main.hs
  other-modules:      Token,
                      Parser
  build-depends:      base ^>= 4.17.2.1, 
                      array >= 0.5.4.0
  build-tool-depends: happy:happy >= 1.20.1.1 -- Just to be sure ...
  default-language:   GHC2021

I appreciate any kind of help in this situation. This issue is also related to #241.

BjoernLoetters commented 4 months ago

Just a thought that came to my mind right now: May this be an issue with some low level code failing? I run Happy on a Mac with Apple Silicon ...

andreasabel commented 4 months ago

I am assuming you are using the latest release of happy, 1.20.1.1.

$ happy --version
Happy Version 1.20.1.1 

On my old-school Mac, your example prints 0 just like expected, no internal error.

My package.yaml for this project:

executable:
  source-dirs: .
  main: Main

dependencies:
- base
- array

build-tools:
- happy

Also just checking: what is your GHC version? See: https://www.haskell.org/ghc/blog/20210309-apple-m1-story.html

BjoernLoetters commented 4 months ago

Thanks for your quick response and pardon the missing informations:

My build.cabal:

cabal-version:      3.0
build-type:         Simple

...

executable HopefullyHappySoon:
  hs-source-dirs:     src
  main-is:            Main.hs
  other-modules:      Token,
                      Parser
  build-depends:      base ^>= 4.17.2.1, 
                      array >= 0.5.4.0
  build-tool-depends: happy:happy >= 1.20.1.1 -- Just to be sure ...
  default-language:   GHC2021

Please let me know if there is still information missing.

BjoernLoetters commented 4 months ago

As the example works for you, I suppose it is somehow related to my environment. So, I tried the following steps in addition:

andreasabel commented 4 months ago

Thanks for the extra information. This looks good in general, so I do not see an obvious point of attack.

Unfortunately, since I cannot produce it locally, I cannot go hunting for the problem myself. I suggest you shrink the MWE by inlining the happy generated Haskell and then reducing it is further and further, making sure the same crash remains. Once it is down to a few lines of Haskell, maybe an issue could be reported to GHC, or it turns out that Happy is generating some code that breaks on the new M3 architecture. Anyway, if you get such a minimal example, please post it here as well to keep me in the loop.

BjoernLoetters commented 4 months ago

I followed your advice and tracked down the error by reducing the code in a call-by-name fashion. This was quite an adventure and the outcome is somewhat frustrating. TL;DR: I should have copied my whole .cabal-file into this issue (lesson learned!), instead of just writing down the "seemingly relevant" parts.

So, the story is as follows: I had the StrictData extension enabled somewhere in the .cabal-file and instead of creating a new .cabal-file for the minimal example, I just copied this flag all along and did not notice it is still there, only sharing the "relevant" (as I thought) excerpt of the file with you. The problem became apparent to me as soon as I rewrote data HappyStk a = HappyStk a (HappyStk a) to data HappyStk a = HappyStk a ~(HappyStk a). Without this lazy annotation the notHappyAtAll error is (of course 😮‍💨) evaluated when the code hits the pattern HappyStk _ _ in my example. Disabling the StrictData extension fixes this issue as well and the parser runs just fine.

Since Happy is a parser generator whose output must be compiled by the user, I think it would be great to mention this somewhere in the documentation (I could at least not find anything related to StrictData in the HTML documentation with a full-text search). Another approach would be to make Happy strict-safe. At the moment, it seems like it's only the notHappyAtAll error which should not be evaluated strictly. Since this error is not very informative anyway, it may be an option to handle the empty stack situations in a different way, not relying on Haskell's default laziness.

Anyway, thank you for your help and time!

andreasabel commented 4 months ago

Oh, sorry to hear you had a long wild goose chase...

It would be nice if we could rewrite happy to make streams explicitly lazy:

data HappyStk a = HappyStk a ~(HappyStk a)

This is currently not possible without turning on StrictData, see this recent GHC issue:

Currently, we would have to turn on StrictData in the generated parser, which could have unforeseen consequences for the user code woven into the parser.

Of course, we could use ML-style explicit laziness:

data HappyStk a = HappyStk a (() -> HappyStk a)

which is a slap in the face of any Haskeller, haha.
One should investigate whether this would have some performance penalties, because we are replacing call-by-need with call-by-name.

andreasabel commented 4 months ago

What we could do now is to add {-# LANGUAGE NoStrictData #-} to happy-generated code.

andreasabel commented 4 months ago

The happy template defines two stream types that never make sense with StrictData: https://github.com/haskell/happy/blob/535ce96533fb693a8a4441b0ac17cdb78bddfeec/packages/backend-lalr/data/HappyTemplate.hs#L39 https://github.com/haskell/happy/blob/535ce96533fb693a8a4441b0ac17cdb78bddfeec/packages/backend-lalr/data/HappyTemplate.hs#L84 So I propose to add {-# LANGUAGE NoStrictData #-} to the template. @int-index @Ericson2314 What do you think?

donovancrichton commented 1 month ago

Just confirming that I had this same issue and solved it by removing the StrictData extension from my cabal file. Thanks for writing this up!