yi-editor / yi

The Haskell-Scriptable Editor
GNU General Public License v2.0
1.51k stars 203 forks source link

Provide an example of non-alex highlighter #946

Open ethercrow opened 7 years ago

ethercrow commented 7 years ago

People should be free to add regexp-based, parsec-based, trifecta-based, call-the-external-compiler-based and whatever-else-based highlighters.

noughtmare commented 7 years ago

I would love to look into this, but I don't know how Yi's highlighting system works. Can you give a brief explanation of how it works (or can you give hints to where in the code I could start learning about it)? I'm about to dive into the code and find out myself.

ethercrow commented 7 years ago

@noughtmare I don't really know myself and consider it the most mysterious part of the codebase. Papers linked at the end of the readme file may be helpful.

noughtmare commented 7 years ago

Ok I think I'm onto something: Highlighting in Yi is done with Strokes consisting of: a begin point, an end point and the style of that region (defined in yi-language/src/Yi/Syntax.hs). The specific highlighter implementation needs to have a getStrokes function that can take an AST and return a list of strokes (defined in yi-mode-haskell/src/Yi/Syntax/Strokes/Haskell.hs for the haskell mode).

noughtmare commented 7 years ago

I tried making a megaparsec based brainfuck highlighter and came up with an issue: lexers for Yi highlighters need to be incremental which basically rules out all unmodified parsec-based lexers.

I'm now going to try attoparsec which does have support for incremental parsing.

We can also first give only one character as input and then if *parsec gives an error about an unexpected end of input we can give it more input and run the parser again.

noughtmare commented 7 years ago

I found out more: Yi modes need a lexer and a highlighter which it calls Highlighter and getStrokes respectively.

The Highlighter is for some reason hidden behind an ExtHL type and can be found here.

getStrokes is just a function:

getStrokes :: syntax -> Point -> Point -> Point -> [Stroke]

The first Point is the cursor position, the second Point is the first visible point and the third Point is the last visible point (these last two points will increment when scrolling down and decrement when scrolling down).

Fuuzetsu commented 7 years ago

Yes, ^ is a big problem and why we don't have non-alex stuff to begin with: you can write them but it's probably not convenient.

FWIW I have some old code which had highlighting but didn't use alex-based anything. The way it worked was through using commonLexer that ignored buffer input and instead used an external process to load in file, retrieve identifier types and locations and apply the overlay. This worked for me because the language needed compiler support to determine identifier types (i.e. not something parser could do). Not an ideal solution but it worked. See https://github.com/Fuuzetsu/yi-agda/blob/master/src/Yi/Mode/Agda.hs#L220 for example. I don't recommend this approach, just pointing out the possibility.

noughtmare commented 7 years ago

I got an example with megaparsec (I'll probably switch to attoparsec) working. Preview: yi-mode-brainfuck screenshot

Code from https://en.wikipedia.org/wiki/Brainfuck#Examples

noughtmare commented 7 years ago

Ok, I guess it's pretty easy when you don't cache intermediate results:


{-# LANGUAGE OverloadedStrings #-}
module Yi.Mode.Brainfuck where

import Yi.Mode.Common (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types (Mode (..))

import Yi.Buffer.Misc
import Yi.Editor
import Yi.Style
import Yi.Syntax

-- Lexing

data Stmt = Next | Prev | Inc | Dec | Out | In | Loop | End | Comment
  deriving (Show)

parse :: Scanner Point Char -> [Span Stmt]
parse sc = map (uncurry parseOne) $ scanRun sc $ Point 0
  where 
    parseOne :: Point -> Char -> Span Stmt
    parseOne p '>' = toSpan p Next
    parseOne p '<' = toSpan p Prev
    parseOne p '+' = toSpan p Inc
    parseOne p '-' = toSpan p Dec
    parseOne p '.' = toSpan p Out
    parseOne p ',' = toSpan p In
    parseOne p '[' = toSpan p Loop
    parseOne p ']' = toSpan p End
    parseOne p  _  = toSpan p Comment
    toSpan p x = Span p x (succ p)

brainfuckMode :: Mode [Span Stmt]
brainfuckMode = emptyMode
  & modeAppliesA .~ anyExtension ["bf", "brainfuck"]
  & modeNameA .~ "brainfuck"
  & modeGetStrokesA .~ getStrokes
  & modeHLA .~ ExtHL bfHighlighter
  where
    bfHighlighter :: Highlighter [Span Stmt] [Span Stmt]
    bfHighlighter = SynHL [] run (\cache _ -> cache) (\_ cache -> cache)

    run :: Scanner Point Char -> Point -> [Span Stmt] -> [Span Stmt]
    run sc _ _ = parse sc

-- Highlighting

stmtToColor :: Stmt -> Color
stmtToColor Next    = yellow
stmtToColor Prev    = yellow
stmtToColor Inc     = green
stmtToColor Dec     = red
stmtToColor Out     = cyan
stmtToColor In      = cyan
stmtToColor Loop    = blue
stmtToColor End     = blue
stmtToColor Comment = magenta

colorToFg = const . withFg

getStrokes :: [Span Stmt] -> Point -> Point -> Point -> [Stroke]
getStrokes l _ _ _ = map (fmap (colorToFg . stmtToColor)) l
noughtmare commented 7 years ago

Now the caching version:


{-# LANGUAGE OverloadedStrings #-}
module Yi.Mode.Brainfuck where

import Yi.Mode.Common (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types (Mode (..))

import Yi.Buffer.Misc
import Yi.Editor
import Yi.Style
import Yi.Syntax

import Data.Maybe (fromMaybe)

-- Lexing

data Stmt = Next | Prev | Inc | Dec | Out | In | Loop | End | Comment
  deriving (Show, Eq, Ord, Enum)

type State = (Point, [Span Stmt])

scanner :: Scanner Point Char -> Scanner State [Span Stmt]
scanner sc = Scanner
  { scanInit = (scanInit sc, [])
  , scanLooked = scanLooked sc . fst
  , scanEmpty = []
  , scanRun = run
  }
  where
    run :: State -> [(State, [Span Stmt])]
    run (start,curState) = go curState (scanRun sc start)

    go :: [Span Stmt] -> [(Point, Char)] -> [(State, [Span Stmt])]
    go _ [] = []
    go (curState) toks@((st,token):rest) = ((st, curState), result) : go nextState rest
      where
        nextState :: [Span Stmt]
        nextState = curState ++ [parse st token]

        result :: [Span Stmt]
        result = curState ++ map (uncurry parse) toks

    parse p c = Span p (fromMaybe Comment $ lookup c table) (succ p)

    table = zip "><+-.,[]" [Next .. End]

brainfuckMode :: Mode [Span Stmt]
brainfuckMode = emptyMode
  & modeAppliesA .~ anyExtension ["bf", "brainfuck"]
  & modeNameA .~ "brainfuck"
  & modeGetStrokesA .~ getStrokes
  & modeHLA .~ ExtHL (mkHighlighter scanner)

-- Highlighting

stmtToColor :: Stmt -> Color
stmtToColor Next    = yellow
stmtToColor Prev    = yellow
stmtToColor Inc     = green
stmtToColor Dec     = red
stmtToColor Out     = cyan
stmtToColor In      = cyan
stmtToColor Loop    = blue
stmtToColor End     = blue
stmtToColor Comment = magenta

colorToFg = const . withFg

getStrokes :: [Span Stmt] -> Point -> Point -> Point -> [Stroke]
getStrokes l _ _ _ = map (fmap (colorToFg . stmtToColor)) l
ghost commented 7 years ago

Feel free to add your findings to the documentation.

noughtmare commented 7 years ago

There are a few issues I encountered:

noughtmare commented 7 years ago

I have discovered that uu-parsinglib is amazing. I have created a highlighter that works with them:


{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE MultiParamTypeClasses #-}
module Yi.Mode.Brainfuck where

import Text.ParserCombinators.UU
import Text.ParserCombinators.UU.Utils hiding (runParser)
import Text.ParserCombinators.UU.BasicInstances hiding (Parser)
import Data.Functor (($>))

import Yi.Mode.Common      (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types            (Mode)

import Yi.Buffer.Misc (modeAppliesA, modeNameA, modeGetStrokesA, modeHLA, emptyMode)
import Yi.Style       -- Colors and Styles
import Yi.Syntax      -- Scanners, Spans, Strokes and mkHighlighter

import Data.Maybe (fromMaybe)

type Parser a = P (Str Char String Point) a

instance IsLocationUpdatedBy Point Char where
  advance p _c = succ p

brainfuckMode :: Mode [Span Op]
brainfuckMode = emptyMode
  & modeAppliesA .~ anyExtension ["bf", "brainfuck"]
  & modeNameA .~ "brainfuck"
  & modeGetStrokesA .~ getStrokes
  & modeHLA .~ ExtHL (mkHighlighter scanner)

-- Lexing

data Op = Next | Prev | Inc | Dec | Out | In | Loop | Ret | Comment
  deriving (Show, Eq, Ord, Enum)

type State = (Point, Process)

newtype Process = Process (String -> [Span Op])

instance Show Process where
  show (Process f) = show $ f ""

pInc, pDec, pNext, pPrev, pLoop, pInp, pOut :: Parser (Span Op)
pInc  = Span <$> pPos <*> (pSome (pSym '+') $> Inc ) <*> pPos
pDec  = Span <$> pPos <*> (pSome (pSym '-') $> Dec ) <*> pPos
pNext = Span <$> pPos <*> (pSome (pSym '>') $> Next) <*> pPos
pPrev = Span <$> pPos <*> (pSome (pSym '<') $> Prev) <*> pPos
pLoop = Span <$> pPos <*> (pSome (pSym '[') $> Loop) <*> pPos
pRet  = Span <$> pPos <*> (pSome (pSym ']') $> Ret ) <*> pPos
pInp  = Span <$> pPos <*> (pSome (pSym ',') $> In  ) <*> pPos
pOut  = Span <$> pPos <*> (pSome (pSym '.') $> Out ) <*> pPos
pComment = Span <$> pPos <*> (pSome (pSatisfy (`notElem` ("+-><[],." :: String)) (Insertion "Comment" 'c' 5) :: Parser Char) $> Comment) <*> pPos

brainfuck :: Parser [Span Op]
brainfuck = pList $ (<<|> pComment) $ foldl1 (<|>) [pInc, pDec, pNext, pPrev, pLoop, pRet, pInp, pOut]

runParser :: Parser a -> Point -> String -> a
runParser parser start str = parse (parser <* pEnd) (createStr start str)

scanner :: Scanner Point Char -> Scanner State [Span Op]
scanner sc = Scanner
  { scanInit   = (scanInit sc, Process $ runParser brainfuck (scanInit sc))
  , scanLooked = scanLooked sc . fst
  , scanEmpty  = runParser brainfuck (Point 0) ""
  , scanRun    = (\(st,f) -> go f $ scanRun sc st)
  }
  where 
    go :: Process -> [(Point, Char)] -> [(State, [Span Op])]
    go _ [] = []
    go (Process f) toks@((st, char):rest) = 
      ((st, Process f), f $ map snd toks) : go (Process $ f . (char :)) rest

-- Highlighting

opToColor :: Op -> Color
opToColor Next    = yellow
opToColor Prev    = yellow
opToColor Inc     = green
opToColor Dec     = red
opToColor Out     = cyan
opToColor In      = cyan
opToColor Loop    = blue
opToColor Ret     = blue
opToColor Comment = magenta

colorToFg :: Color -> StyleName
colorToFg = const . withFg

getStrokes :: [Span Op] -> Point -> Point -> Point -> [Stroke]
getStrokes l _ _ _ = map (fmap (colorToFg . opToColor)) l
ghost commented 7 years ago

Nice! uu-parsinglib uses breadth first exploration instead of depth first backtracking. This means that the parser can be fully incremental and doesn't need to keep the entire input around. Of course the implementation might need to be tuned to take advantage of that.

noughtmare commented 7 years ago

I'm still having problems with high memory usage in the scanner.go function: yi-bf

ghost commented 7 years ago

Wouldn't Process $ f . (char :) technically store unnecessary copies of the buffer as thunks?

noughtmare commented 7 years ago

@siddhanathan Yes, I now understand that's the entire reason Parser.Incremental exists. I've now made a version that uses Parser.Incremental:


{-# LANGUAGE OverloadedStrings #-}
module Yi.Mode.Brainfuck where

import Yi.IncrementalParse hiding (Parser)
import qualified Yi.IncrementalParse as P

import Data.Functor (($>))

import Yi.Mode.Common      (anyExtension)
import Lens.Micro.Platform ((.~), (&))
import Yi.Types            (Mode)

import Yi.Buffer.Misc (modeAppliesA, modeNameA, modeGetStrokesA, modeHLA, emptyMode)
import Yi.Style       -- Colors and Styles
import Yi.Syntax      -- Scanners, Spans, Strokes and mkHighlighter

import Data.Maybe (fromMaybe)
import Data.List (inits)

import Data.Bifunctor (first, second)
import Control.Applicative

type Parser a = P.Parser Char a

brainfuckMode :: Mode [Op]
brainfuckMode = emptyMode
  & modeAppliesA .~ anyExtension ["bf", "brainfuck"]
  & modeNameA .~ "brainfuck"
  & modeGetStrokesA .~ getStrokes 0
  & modeHLA .~ ExtHL (mkHighlighter $ skipScanner 500 . scanner brainfuck)

-- Lexing

data Op = Next | Prev | Inc | Dec | Out | In | Loop | Ret | Comment
  deriving (Show, Eq, Ord, Enum)

pSym c = symbol (== c)
pSatisfy = symbol

pInc, pDec, pNext, pPrev, pLoop, pInp, pOut :: Parser Op
pInc     = pSym '+' $> Inc
pDec     = pSym '-' $> Dec
pNext    = pSym '>' $> Next
pPrev    = pSym '<' $> Prev
pLoop    = pSym '[' $> Loop
pRet     = pSym ']' $> Ret
pInp     = pSym ',' $> In
pOut     = pSym '.' $> Out
pComment = pNotElem "+-><[],." $> Comment

pNotElem :: [Char] -> Parser Char
pNotElem l = pSatisfy (`notElem` l)

brainfuck :: Parser [Op]
brainfuck = (many $ (pInc <|> pDec <|> pNext <|> pPrev <|> pLoop <|> pRet <|> pInp <|> pOut <|> pComment)) <* eof

-- Highlighting

opToColor :: Op -> Color
opToColor Next    = yellow
opToColor Prev    = yellow
opToColor Inc     = green
opToColor Dec     = red
opToColor Out     = cyan
opToColor In      = cyan
opToColor Loop    = blue
opToColor Ret     = blue
opToColor Comment = magenta

colorToFg :: Color -> StyleName
colorToFg = const . withFg

getStrokes :: Int -> [Op] -> Point -> Point -> Point -> [Stroke]
getStrokes _ [] _ _ _ = []
getStrokes curPos (op:rest) _ _ _ = Span (Point curPos) (colorToFg (opToColor op)) (Point (curPos + 1)) : getStrokes (curPos + 1) rest undefined undefined undefined

The problem is that this has even worse memory usage (it takes so long I had to exit it before it was done).

noughtmare commented 7 years ago

I have some good news and some bad news. The bad news is that we probably can't use any other parser than a highly customized parser for Yi because we need online, incremental and error correcting parsers. The good news is that I think I would be able to understand the current implementation of the Incremental Parser. Either that or we could use a modified version of uu-parsinglib such that it can parse incrementally which shouldn't be too hard to do.

I think the reason why my previous code snippet used so much memory is that many combined with a lot of <|>'s is not a good way of parsing. A better way would be to just pSatisfy (`elem` "+-><[],.") and then later map a function over the resulting string. Again I want to say that brainfuck is not a really good example, because every string satisfies the brainfuck grammar (brainfuck just ignores non "+-><[],." characters).

I will be working on a basic lisp parser to really show how to parse grammars with the incremental parser.

For anybody who is also interested in more advanced parser combinators I strongly recommend reading "Combinator Parsing: A Short Tutorial" by Doaitse Swierstra. It really helped me understand the more advanced aspects of combinator parsing.

quickdudley commented 7 years ago

I also have a parser combinator library which might work well with Yi's syntax highlighting system. It has a few similarities with uu-parsinglib, but originally started out as a cross between Text.ParserCombinators.ReadP and conduit. It does incremental parsing, and can optionally produce a stream of values during parsing (incremental values don't have to be the same type as the final result)

noughtmare commented 7 years ago

@quickdudley Your parser library looks very interesting. I have a question which I couldn't answer by just quickly looking at the code: are they online?

e.g. if you wrote a parser that parses a list of 'a', for example, and you gave it ('a':undefined) as input and then only asked for the first element of the resulting list (something like head (parse (many (char 'a')) ('a':undefined))) would it give the character 'a' as output?

We use this online behaviour to only parse and highlight the currently visible text.

quickdudley commented 7 years ago

head (parse (many (char 'a'))) ('a':undefined) would be undefined because of how the parse function works, but if you were manually using beforeStep, step, and extract you'd be able to avoid it. beforeStep isn't strictly necessary but it can tell you when the parser has stopped accepting input and also cleans out any stale "unexpected eof" type error messages.

ethercrow commented 7 years ago

I'm also working on moving alex-specific stuff from core such that an API for highlighting becomes something like "given a buffer and a region that is currently visible, return a collection of colored regions" and modes can implement that however they want: https://github.com/yi-editor/yi/pull/989

I don't have an example of incremental parsing for this branch yet. If you're interested in non-incremental one, there's this: https://github.com/ethercrow/yi-config/blob/0.15/modules/RainbowMode.hs . rainbowGetStrokes here is the entry point for highlighting.