jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
189 stars 61 forks source link

[BUG]: color mismatch in LaTeX code for "matrix" inside "equation" #111

Closed VincentTam closed 3 years ago

VincentTam commented 3 years ago

pandoc version: 2.11 (downloaded from the lastest stable deb here) source file: Markdown target format: PDF pdf engine: default (pdflatex) minimal example test.md compiled with pandoc test.md -o test.pdf

```tex
\begin{equation}
  \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}
\end{equation}
```

result: first bmatrix in red, second bmatrix in black Screenshot from 2020-11-10 17-20-05 expected result: both bmatrix in black

jgm commented 3 years ago

skylighting is tokenizing this as follows:

[ [ ( KeywordTok , "\\begin" )
  , ( NormalTok , "{" )
  , ( ExtensionTok , "equation" )
  , ( NormalTok , "}" )
  ]
, [ ( SpecialStringTok , "  " )
  , ( KeywordTok , "\\begin" )
  , ( NormalTok , "{" )
  , ( ErrorTok , "bmatrix" )
  , ( NormalTok , "}" )
  , ( SpecialStringTok , " 1 & 1 " )
  , ( SpecialCharTok , "\\\\" )
  , ( SpecialStringTok , " 1 & 1 " )
  , ( KeywordTok , "\\end" )
  , ( NormalTok , "{" )
  , ( ExtensionTok , "bmatrix" )
  , ( NormalTok , "}" )
  ]
, [ ( KeywordTok , "\\end" )
  , ( NormalTok , "{" )
  , ( ExtensionTok , "equation" )
  , ( NormalTok , "}" )
  ]
]
jgm commented 3 years ago

The problem is the ErrorTok for the first bmatrix. Everything else looks okay.

jgm commented 3 years ago

Simpler repro:

\begin{equation}
\begin{bmatrix}
jgm commented 3 years ago

Relevant skylighting --trace output:

Keyword MATCHED Just (ExtensionTok,"equation")
CONTEXT STACK ["MathEnv","BeginEnvironment","FindBeginEnvironment","Normal Text"]
Trying rule Rule {rMatcher = DetectChar '}', rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","MathModeEnv")]}
DetectChar MATCHED Just (NormalTok,"}")
CONTEXT STACK ["MathModeEnv","MathEnv","BeginEnvironment","FindBeginEnvironment","Normal Text"]
Trying rule Rule {rMatcher = DetectSpaces, rAttribute = SpecialStringTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = DetectIdentifier, rAttribute = SpecialStringTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = DetectChar '\\', rAttribute = SpecialStringTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = True, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","BackslashMathModeEnv")]}
DetectChar MATCHED Nothing
CONTEXT STACK ["BackslashMathModeEnv","MathModeEnv","MathEnv","BeginEnvironment","FindBeginEnvironment","Normal Text"]
Trying rule Rule {rMatcher = Keyword (KeywordAttr {keywordCaseSensitive = True, keywordDelims = fromList "\t\n !$%&()+,-./:;<=>?[]^{|}~"}) (CaseSensitiveWords (fromList ["\\begin"])), rAttribute = KeywordTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Pop,Push ("LaTeX","FindBeginEnvironmentInMathMode")]}
Keyword MATCHED Just (KeywordTok,"\\begin")
CONTEXT STACK ["MathModeEnv","MathEnv","BeginEnvironment","FindBeginEnvironment","Normal Text"]
CONTEXT STACK ["FindBeginEnvironmentInMathMode","MathModeEnv","MathEnv","BeginEnvironment","FindBeginEnvironment","Normal Text"]
Trying rule Rule {rMatcher = DetectSpaces, rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = DetectChar '{', rAttribute = NormalTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","BeginEnvironmentInMathMode")]}
DetectChar MATCHED Just (NormalTok,"{")
CONTEXT STACK ["BeginEnvironmentInMathMode","FindBeginEnvironmentInMathMode","MathModeEnv","MathEnv","BeginEnvironment","FindBeginEnvironment","Normal Text"]
Trying rule Rule {rMatcher = Keyword (KeywordAttr {keywordCaseSensitive = True, keywordDelims = fromList "\t\n !$%&()+,-./:;<=>?[]^{|}~"}) (CaseSensitiveWords (fromList ["lstlisting","lstlisting*"])), rAttribute = ExtensionTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","ListingsEnv")]}
Trying rule Rule {rMatcher = Keyword (KeywordAttr {keywordCaseSensitive = True, keywordDelims = fromList "\t\n !$%&()+,-./:;<=>?[]^{|}~"}) (CaseSensitiveWords (fromList ["minted","minted*"])), rAttribute = ExtensionTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","MintedEnv")]}
Trying rule Rule {rMatcher = Keyword (KeywordAttr {keywordCaseSensitive = True, keywordDelims = fromList "\t\n !$%&()+,-./:;<=>?[]^{|}~"}) (CaseSensitiveWords (fromList ["BVerbatim","BVerbatim*","LVerbatim","LVerbatim*","Verbatim","Verbatim*","boxedverbatim","boxedverbatim*","verbatim","verbatim*"])), rAttribute = ExtensionTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = True, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","VerbatimEnv")]}
Trying rule Rule {rMatcher = Keyword (KeywordAttr {keywordCaseSensitive = True, keywordDelims = fromList "\t\n !$%&()+,-./:;<=>?[]^{|}~"}) (CaseSensitiveWords (fromList ["comment","comment*"])), rAttribute = ExtensionTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","CommentEnv")]}
Trying rule Rule {rMatcher = Keyword (KeywordAttr {keywordCaseSensitive = True, keywordDelims = fromList "\t\n !$%&()+,-./:;<=>?[]^{|}~"}) (CaseSensitiveWords (fromList ["longtable","longtable*","mpsupertabular","mpsupertabular*","mpxtabular","mpxtabular*","supertabular","supertabular*","tabular","tabular*","tabularx","tabularx*","xtabular","xtabular*"])), rAttribute = ExtensionTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Push ("LaTeX","TabEnv")]}
Trying rule Rule {rMatcher = DetectChar '\215', rAttribute = InformationTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = []}
Trying rule Rule {rMatcher = RegExpr (RE {reString = "(?:equation|IEEEeqnarray(?:box)?|(?:[BVvbp]|small)matrix|(?:fl)?align|x{0,2}alignat|cases|displaymath|gather|math|multline|(?:sub)?eqnarray)(?=[^a-zA-Z]|$)\\*?", reCaseSensitive = True}), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Pop]}
RegExpr MATCHED Just (ErrorTok,"bmatrix")
jgm commented 3 years ago

bmatrix is matching on

Trying rule Rule {rMatcher = RegExpr (RE {reString = "(?:equation|IEEEeqnarray(?:box)?|(?:[BVvbp]|small)matrix|(?:fl)?align|x{0,2}alignat|cases|displaymath|gather|math|multline|(?:sub)?eqnarray)(?=[^a-zA-Z]|$)\\*?", reCaseSensitive = True}), rAttribute = ErrorTok, rIncludeAttribute = False, rDynamic = False, rCaseSensitive = True, rChildren = [], rLookahead = False, rFirstNonspace = False, rColumn = Nothing, rContextSwitch = [Pop]}

because of (?:[BVvbp]|small)matrix. Not sure why this is here.

jgm commented 3 years ago

There's a comment in latex.xml that explains the line, but I don't really understand.

        <!-- keywords in MathEnvParam and MathEnv. Do not use keyword to avoid autocomplete -->
        <RegExpr String="(?:equation|IEEEeqnarray(?:box)?|(?:[BVvbp]|small)matrix|(?:fl)?align|x{0,2}alignat|cases|displaymath|gather|math|multline|(?:sub)?eqnarray)(?=[^a-zA-Z]|$)\*?" attribute="Error" context="#pop"/>

This looks like a bug in the KDE syntax definition. I tried it in the Kate editor and got the same result.

jgm commented 3 years ago

@VincentTam I suggest you report the issue to KDE: https://invent.kde.org/frameworks/syntax-highlighting/-/issues

Note to @christoph-cullmann : it would be nice if submitting bug reports were lower friction. Looks like you have to create a special KDE identity. And the KDE/syntax-highlighting repo on GitHub accepts PRs but not issues.

christoph-cullmann commented 3 years ago

Hi, at the moment all KDE bug reports still go via bugs.kde.org, that is unfortunate for such cases but is at the moment the general policy for KDE projects.

VincentTam commented 3 years ago

@jgm Thanks for your effort. Bug reported at https://bugs.kde.org/show_bug.cgi?id=428947.

jgm commented 3 years ago

Excellent. Let me know when the bug is fixed upstream, and we can merge in the new syntax definition.

VincentTam commented 3 years ago

Excellent. Let me know when the bug is fixed upstream, and we can merge in the new syntax definition.

@jgm bug fixed plz check