haskell / alex

A lexical analyser generator for Haskell
https://hackage.haskell.org/package/alex
BSD 3-Clause "New" or "Revised" License
298 stars 82 forks source link

Prelude.head empty list exception for zero or more matches. #71

Closed iand675 closed 4 years ago

iand675 commented 9 years ago

It looks like the zero-or-more matching functionality in Alex 3.1.4 has an error. In the following minimal example, alex emits the following error: alex: Prelude.head: empty list.

In the event that I change line 7 from @whitespaces = $whitespace* to @whitespaces = $whitespace+, the error goes away.

{
module Data.CSS.Lex where
import Data.ByteString.Lazy (ByteString)
}

%wrapper "posn-bytestring"

$whitespace = [\ \n\t]

@whitespaces = $whitespace*

state:-

<0> @whitespaces { mkL Whitespace }

{
mkL :: CSSToken -> AlexPosn -> ByteString -> CSSLexeme
mkL x _ _ = x

data CSSToken = Whitespace

type CSSLexeme = CSSToken
}
erikd commented 7 years ago

Is this still an issue with the latest release?

andreasabel commented 5 years ago

Yes.

$ alex Issue71.x 
alex: Prelude.head: empty list
$ alex --version
Alex version 3.2.4, (c) 2003 Chris Dornan and Simon Marlow

(2003 should be either dropped or updated.)

andreasabel commented 5 years ago

The following call to head throws the error: https://github.com/simonmar/alex/blob/f2c343b9b20ff9508f482e87528ef486c55b9f86/src/DFAMin.hs#L58

andreasabel commented 4 years ago

The remaining problem is that if the lexer definition contains nullable tokens and the Alex gets stuck (on EOF or no just no possible DFA-transition), it outputs an infinite sequence of the nullable tokens. This happens because alex_scan_tkn gladly returns accepts of length 0. The problem can be fixed by checking for the scanned length here: https://github.com/simonmar/alex/blob/0198f7387a8f31fd8a7ff3efe453edbb00cba4fb/templates/GenericTemplate.hs#L155 and if the length is still 0, output AlexNone instead of accept or skip.

Why should a lexer not return tokens of length 0? Well, because the number of these token sequences is not determined, it could output any number of such tokens at any point. Thus, outputting 0 of such tokens only is a sensible (and the only sensible) choice.

andreasabel commented 4 years ago

I'd say this issue is fixed now and can be closed.