Open ethercrow opened 7 years ago
I would love to look into this, but I don't know how Yi's highlighting system works. Can you give a brief explanation of how it works (or can you give hints to where in the code I could start learning about it)? I'm about to dive into the code and find out myself.
@noughtmare I don't really know myself and consider it the most mysterious part of the codebase. Papers linked at the end of the readme file may be helpful.
Ok I think I'm onto something:
Highlighting in Yi is done with Stroke
s consisting of: a begin point, an end point and the style of that region (defined in yi-language/src/Yi/Syntax.hs
).
The specific highlighter implementation needs to have a getStrokes
function that can take an AST and return a list of strokes (defined in yi-mode-haskell/src/Yi/Syntax/Strokes/Haskell.hs
for the haskell mode).
I tried making a megaparsec based brainfuck highlighter and came up with an issue: lexers for Yi highlighters need to be incremental which basically rules out all unmodified parsec-based lexers.
I'm now going to try attoparsec which does have support for incremental parsing.
We can also first give only one character as input and then if *parsec gives an error about an unexpected end of input we can give it more input and run the parser again.
I found out more:
Yi modes need a lexer and a highlighter which it calls Highlighter
and getStrokes
respectively.
The Highlighter
is for some reason hidden behind an ExtHL type and can be found here.
getStrokes
is just a function:
getStrokes :: syntax -> Point -> Point -> Point -> [Stroke]
The first Point
is the cursor position, the second Point
is the first visible point and the third Point
is the last visible point (these last two points will increment when scrolling down and decrement when scrolling down).
Yes, ^ is a big problem and why we don't have non-alex stuff to begin with: you can write them but it's probably not convenient.
FWIW I have some old code which had highlighting but didn't use alex-based anything. The way it worked was through using commonLexer
that ignored buffer input and instead used an external process to load in file, retrieve identifier types and locations and apply the overlay. This worked for me because the language needed compiler support to determine identifier types (i.e. not something parser could do). Not an ideal solution but it worked. See https://github.com/Fuuzetsu/yi-agda/blob/master/src/Yi/Mode/Agda.hs#L220 for example. I don't recommend this approach, just pointing out the possibility.
I got an example with megaparsec (I'll probably switch to attoparsec) working. Preview:
Ok, I guess it's pretty easy when you don't cache intermediate results:
{-# LANGUAGE OverloadedStrings #-}
module Yi.Mode.Brainfuck where
import Yi.Mode.Common (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types (Mode (..))
import Yi.Buffer.Misc
import Yi.Editor
import Yi.Style
import Yi.Syntax
-- Lexing
data Stmt = Next | Prev | Inc | Dec | Out | In | Loop | End | Comment
deriving (Show)
parse :: Scanner Point Char -> [Span Stmt]
parse sc = map (uncurry parseOne) $ scanRun sc $ Point 0
where
parseOne :: Point -> Char -> Span Stmt
parseOne p '>' = toSpan p Next
parseOne p '<' = toSpan p Prev
parseOne p '+' = toSpan p Inc
parseOne p '-' = toSpan p Dec
parseOne p '.' = toSpan p Out
parseOne p ',' = toSpan p In
parseOne p '[' = toSpan p Loop
parseOne p ']' = toSpan p End
parseOne p _ = toSpan p Comment
toSpan p x = Span p x (succ p)
brainfuckMode :: Mode [Span Stmt]
brainfuckMode = emptyMode
& modeAppliesA .~ anyExtension ["bf", "brainfuck"]
& modeNameA .~ "brainfuck"
& modeGetStrokesA .~ getStrokes
& modeHLA .~ ExtHL bfHighlighter
where
bfHighlighter :: Highlighter [Span Stmt] [Span Stmt]
bfHighlighter = SynHL [] run (\cache _ -> cache) (\_ cache -> cache)
run :: Scanner Point Char -> Point -> [Span Stmt] -> [Span Stmt]
run sc _ _ = parse sc
-- Highlighting
stmtToColor :: Stmt -> Color
stmtToColor Next = yellow
stmtToColor Prev = yellow
stmtToColor Inc = green
stmtToColor Dec = red
stmtToColor Out = cyan
stmtToColor In = cyan
stmtToColor Loop = blue
stmtToColor End = blue
stmtToColor Comment = magenta
colorToFg = const . withFg
getStrokes :: [Span Stmt] -> Point -> Point -> Point -> [Stroke]
getStrokes l _ _ _ = map (fmap (colorToFg . stmtToColor)) l
Now the caching version:
{-# LANGUAGE OverloadedStrings #-}
module Yi.Mode.Brainfuck where
import Yi.Mode.Common (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types (Mode (..))
import Yi.Buffer.Misc
import Yi.Editor
import Yi.Style
import Yi.Syntax
import Data.Maybe (fromMaybe)
-- Lexing
data Stmt = Next | Prev | Inc | Dec | Out | In | Loop | End | Comment
deriving (Show, Eq, Ord, Enum)
type State = (Point, [Span Stmt])
scanner :: Scanner Point Char -> Scanner State [Span Stmt]
scanner sc = Scanner
{ scanInit = (scanInit sc, [])
, scanLooked = scanLooked sc . fst
, scanEmpty = []
, scanRun = run
}
where
run :: State -> [(State, [Span Stmt])]
run (start,curState) = go curState (scanRun sc start)
go :: [Span Stmt] -> [(Point, Char)] -> [(State, [Span Stmt])]
go _ [] = []
go (curState) toks@((st,token):rest) = ((st, curState), result) : go nextState rest
where
nextState :: [Span Stmt]
nextState = curState ++ [parse st token]
result :: [Span Stmt]
result = curState ++ map (uncurry parse) toks
parse p c = Span p (fromMaybe Comment $ lookup c table) (succ p)
table = zip "><+-.,[]" [Next .. End]
brainfuckMode :: Mode [Span Stmt]
brainfuckMode = emptyMode
& modeAppliesA .~ anyExtension ["bf", "brainfuck"]
& modeNameA .~ "brainfuck"
& modeGetStrokesA .~ getStrokes
& modeHLA .~ ExtHL (mkHighlighter scanner)
-- Highlighting
stmtToColor :: Stmt -> Color
stmtToColor Next = yellow
stmtToColor Prev = yellow
stmtToColor Inc = green
stmtToColor Dec = red
stmtToColor Out = cyan
stmtToColor In = cyan
stmtToColor Loop = blue
stmtToColor End = blue
stmtToColor Comment = magenta
colorToFg = const . withFg
getStrokes :: [Span Stmt] -> Point -> Point -> Point -> [Stroke]
getStrokes l _ _ _ = map (fmap (colorToFg . stmtToColor)) l
Feel free to add your findings to the documentation.
There are a few issues I encountered:
I have discovered that uu-parsinglib
is amazing. I have created a highlighter that works with them:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MultiParamTypeClasses #-}
module Yi.Mode.Brainfuck where
import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils hiding (runParser)
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
import Data.Functor (($>))
import Yi.Mode.Common (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types (Mode)
import Yi.Buffer.Misc (modeAppliesA, modeNameA, modeGetStrokesA, modeHLA, emptyMode)
import Yi.Style -- Colors and Styles
import Yi.Syntax -- Scanners, Spans, Strokes and mkHighlighter
import Data.Maybe (fromMaybe)
type Parser a = P (Str Char String Point) a
instance IsLocationUpdatedBy Point Char where
advance p _c = succ p
brainfuckMode :: Mode [Span Op]
brainfuckMode = emptyMode
& modeAppliesA .~ anyExtension ["bf", "brainfuck"]
& modeNameA .~ "brainfuck"
& modeGetStrokesA .~ getStrokes
& modeHLA .~ ExtHL (mkHighlighter scanner)
-- Lexing
data Op = Next | Prev | Inc | Dec | Out | In | Loop | Ret | Comment
deriving (Show, Eq, Ord, Enum)
type State = (Point, Process)
newtype Process = Process (String -> [Span Op])
instance Show Process where
show (Process f) = show $ f ""
pInc, pDec, pNext, pPrev, pLoop, pInp, pOut :: Parser (Span Op)
pInc = Span <$> pPos <*> (pSome (pSym '+') $> Inc ) <*> pPos
pDec = Span <$> pPos <*> (pSome (pSym '-') $> Dec ) <*> pPos
pNext = Span <$> pPos <*> (pSome (pSym '>') $> Next) <*> pPos
pPrev = Span <$> pPos <*> (pSome (pSym '<') $> Prev) <*> pPos
pLoop = Span <$> pPos <*> (pSome (pSym '[') $> Loop) <*> pPos
pRet = Span <$> pPos <*> (pSome (pSym ']') $> Ret ) <*> pPos
pInp = Span <$> pPos <*> (pSome (pSym ',') $> In ) <*> pPos
pOut = Span <$> pPos <*> (pSome (pSym '.') $> Out ) <*> pPos
pComment = Span <$> pPos <*> (pSome (pSatisfy (`notElem` ("+-><[],." :: String)) (Insertion "Comment" 'c' 5) :: Parser Char) $> Comment) <*> pPos
brainfuck :: Parser [Span Op]
brainfuck = pList $ (<<|> pComment) $ foldl1 (<|>) [pInc, pDec, pNext, pPrev, pLoop, pRet, pInp, pOut]
runParser :: Parser a -> Point -> String -> a
runParser parser start str = parse (parser <* pEnd) (createStr start str)
scanner :: Scanner Point Char -> Scanner State [Span Op]
scanner sc = Scanner
{ scanInit = (scanInit sc, Process $ runParser brainfuck (scanInit sc))
, scanLooked = scanLooked sc . fst
, scanEmpty = runParser brainfuck (Point 0) ""
, scanRun = (\(st,f) -> go f $ scanRun sc st)
}
where
go :: Process -> [(Point, Char)] -> [(State, [Span Op])]
go _ [] = []
go (Process f) toks@((st, char):rest) =
((st, Process f), f $ map snd toks) : go (Process $ f . (char :)) rest
-- Highlighting
opToColor :: Op -> Color
opToColor Next = yellow
opToColor Prev = yellow
opToColor Inc = green
opToColor Dec = red
opToColor Out = cyan
opToColor In = cyan
opToColor Loop = blue
opToColor Ret = blue
opToColor Comment = magenta
colorToFg :: Color -> StyleName
colorToFg = const . withFg
getStrokes :: [Span Op] -> Point -> Point -> Point -> [Stroke]
getStrokes l _ _ _ = map (fmap (colorToFg . opToColor)) l
Nice! uu-parsinglib
uses breadth first exploration instead of depth first backtracking. This means that the parser can be fully incremental and doesn't need to keep the entire input around. Of course the implementation might need to be tuned to take advantage of that.
I'm still having problems with high memory usage in the scanner.go function:
Wouldn't Process $ f . (char :)
technically store unnecessary copies of the buffer as thunks?
@siddhanathan Yes, I now understand that's the entire reason Parser.Incremental exists. I've now made a version that uses Parser.Incremental:
{-# LANGUAGE OverloadedStrings #-}
module Yi.Mode.Brainfuck where
import Yi.IncrementalParse hiding (Parser)
import qualified Yi.IncrementalParse as P
import Data.Functor (($>))
import Yi.Mode.Common (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types (Mode)
import Yi.Buffer.Misc (modeAppliesA, modeNameA, modeGetStrokesA, modeHLA, emptyMode)
import Yi.Style -- Colors and Styles
import Yi.Syntax -- Scanners, Spans, Strokes and mkHighlighter
import Data.Maybe (fromMaybe)
import Data.List (inits)
import Data.Bifunctor (first, second)
import Control.Applicative
type Parser a = P.Parser Char a
brainfuckMode :: Mode [Op]
brainfuckMode = emptyMode
& modeAppliesA .~ anyExtension ["bf", "brainfuck"]
& modeNameA .~ "brainfuck"
& modeGetStrokesA .~ getStrokes 0
& modeHLA .~ ExtHL (mkHighlighter $ skipScanner 500 . scanner brainfuck)
-- Lexing
data Op = Next | Prev | Inc | Dec | Out | In | Loop | Ret | Comment
deriving (Show, Eq, Ord, Enum)
pSym c = symbol (== c)
pSatisfy = symbol
pInc, pDec, pNext, pPrev, pLoop, pInp, pOut :: Parser Op
pInc = pSym '+' $> Inc
pDec = pSym '-' $> Dec
pNext = pSym '>' $> Next
pPrev = pSym '<' $> Prev
pLoop = pSym '[' $> Loop
pRet = pSym ']' $> Ret
pInp = pSym ',' $> In
pOut = pSym '.' $> Out
pComment = pNotElem "+-><[],." $> Comment
pNotElem :: [Char] -> Parser Char
pNotElem l = pSatisfy (`notElem` l)
brainfuck :: Parser [Op]
brainfuck = (many $ (pInc <|> pDec <|> pNext <|> pPrev <|> pLoop <|> pRet <|> pInp <|> pOut <|> pComment)) <* eof
-- Highlighting
opToColor :: Op -> Color
opToColor Next = yellow
opToColor Prev = yellow
opToColor Inc = green
opToColor Dec = red
opToColor Out = cyan
opToColor In = cyan
opToColor Loop = blue
opToColor Ret = blue
opToColor Comment = magenta
colorToFg :: Color -> StyleName
colorToFg = const . withFg
getStrokes :: Int -> [Op] -> Point -> Point -> Point -> [Stroke]
getStrokes _ [] _ _ _ = []
getStrokes curPos (op:rest) _ _ _ = Span (Point curPos) (colorToFg (opToColor op)) (Point (curPos + 1)) : getStrokes (curPos + 1) rest undefined undefined undefined
The problem is that this has even worse memory usage (it takes so long I had to exit it before it was done).
I have some good news and some bad news. The bad news is that we probably can't use any other parser than a highly customized parser for Yi because we need online, incremental and error correcting parsers. The good news is that I think I would be able to understand the current implementation of the Incremental Parser. Either that or we could use a modified version of uu-parsinglib
such that it can parse incrementally which shouldn't be too hard to do.
I think the reason why my previous code snippet used so much memory is that many
combined with a lot of <|>
's is not a good way of parsing. A better way would be to just pSatisfy (`elem` "+-><[],.")
and then later map a function over the resulting string. Again I want to say that brainfuck is not a really good example, because every string satisfies the brainfuck grammar (brainfuck just ignores non "+-><[],." characters).
I will be working on a basic lisp parser to really show how to parse grammars with the incremental parser.
For anybody who is also interested in more advanced parser combinators I strongly recommend reading "Combinator Parsing: A Short Tutorial" by Doaitse Swierstra. It really helped me understand the more advanced aspects of combinator parsing.
I also have a parser combinator library which might work well with Yi's syntax highlighting system. It has a few similarities with uu-parsinglib
, but originally started out as a cross between Text.ParserCombinators.ReadP
and conduit
. It does incremental parsing, and can optionally produce a stream of values during parsing (incremental values don't have to be the same type as the final result)
@quickdudley Your parser library looks very interesting. I have a question which I couldn't answer by just quickly looking at the code: are they online?
e.g. if you wrote a parser that parses a list of 'a', for example, and you gave it ('a':undefined)
as input and then only asked for the first element of the resulting list (something like head (parse (many (char 'a')) ('a':undefined))
) would it give the character 'a' as output?
We use this online behaviour to only parse and highlight the currently visible text.
head (parse (many (char 'a'))) ('a':undefined)
would be undefined because of how the parse
function works, but if you were manually using beforeStep
, step
, and extract
you'd be able to avoid it. beforeStep
isn't strictly necessary but it can tell you when the parser has stopped accepting input and also cleans out any stale "unexpected eof" type error messages.
I'm also working on moving alex-specific stuff from core such that an API for highlighting becomes something like "given a buffer and a region that is currently visible, return a collection of colored regions" and modes can implement that however they want: https://github.com/yi-editor/yi/pull/989
I don't have an example of incremental parsing for this branch yet. If you're interested in non-incremental one, there's this: https://github.com/ethercrow/yi-config/blob/0.15/modules/RainbowMode.hs . rainbowGetStrokes
here is the entry point for highlighting.
People should be free to add regexp-based, parsec-based, trifecta-based, call-the-external-compiler-based and whatever-else-based highlighters.