haskell / alex

A lexical analyser generator for Haskell
https://hackage.haskell.org/package/alex
BSD 3-Clause "New" or "Revised" License
297 stars 82 forks source link

Explore unboxed parser representations #193

Open Boarders opened 3 years ago

Boarders commented 3 years ago

There is a nice library by Ed Kmett called parsnip: https://hackage.haskell.org/package/parsnip-0/

This represents a parser as follows:

newtype Option a = Option# (# a | (##) #)
type Result s a = (# Option a, Addr#, State# s #)

newtype Parser s a = Parser
 { runParser :: Addr# -> State# s -> Result s a
 }

Here is how the Alex parser for monad-bytestring looks:

 newtype Alex a = Alex { unAlex :: AlexState -> Either String (AlexState, a) }

 -- monad-bytestring wrapper
 data AlexState = AlexState {
        alex_pos :: !AlexPosn,  -- position at current input location
        alex_bpos:: !Int64,     -- bytes consumed so far
        alex_inp :: ByteString.ByteString,      -- the current input
        alex_chr :: !Char,      -- the character before the input
        alex_scd :: !Int        -- the current startcode
      , alex_ust :: AlexUserState -- AlexUserState will be defined in the user program
    }

It seems this sort of unboxing experiment is not so far out of reach for a wrapper that could produce potentially very fast code. On top of that, as it is user-generated code it seems reasonable that it might be quite horrid to actually read. How feasible would it be to add a wrapper inspired by parsnip?