jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
194 stars 62 forks source link

Issues with loading syntax files from disk #200

Closed LiraOnGithub closed 3 days ago

LiraOnGithub commented 4 days ago

When I try to load a syntax file from disk, it seems contexts are not correctly searched for.

Minimal example with unknown language

test.xml

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE language [ ]>
<language name="Test" version="1" kateversion="5.53" section="Sources" extensions="*.*" mimetype="text/plain" license="LGPL">
    <highlighting>
        <contexts>
            <context attribute="Normal" lineEndContext="#stay" name="normal">
                <DetectChar attribute="String" context="string" char="&quot;"/>
            </context>
            <context attribute="String" lineEndContext="#stay" name="string">
                <DetectChar attribute="String" context="#pop" char="&quot;" />
            </context>
        </contexts>
        <itemDatas>
            <itemData name="Normal" defStyleNum="dsNormal" />
            <itemData name="String" defStyleNum="dsString" />
        </itemDatas>
    </highlighting>
</language>

test.hs

#!/usr/bin/env cabal
{- cabal:
    build-depends: base, text, skylighting, skylighting-core
-}

import Data.Text (pack)
import Skylighting (tokenize, TokenizerConfig(..), defaultSyntaxMap)
import Skylighting.Loader (loadSyntaxFromFile)

main :: IO ()
main = do
    Right syntax <- loadSyntaxFromFile "./test.xml"
    case tokenize tokenizerConfig syntax $ pack "a\"b\"c" of
        Right sourceLines -> print sourceLines
        Left e -> putStrLn e
    where
        tokenizerConfig :: TokenizerConfig
        tokenizerConfig = TokenizerConfig
            { traceOutput = False
            , syntaxMap = defaultSyntaxMap
            }

When running the minimal application:

$ ./test.hs
Unknown syntax or context: ("Test","string")

Minimal example with copy of haskell.xml

When I try to load a copy of haskell.xml, but change some of the defStyleNums, it partly works:

#!/usr/bin/env cabal
{- cabal:
    build-depends: base, text, skylighting, skylighting-core
-}

import Data.Text (pack)
import Skylighting (tokenize, TokenizerConfig(..), defaultSyntaxMap)
import Skylighting.Loader (loadSyntaxFromFile)

main :: IO ()
main = do
    Right syntax <- loadSyntaxFromFile "./haskell.xml"
    case tokenize tokenizerConfig syntax $ pack "a`b`c" of
        Right sourceLines -> print sourceLines
        Left e -> putStrLn e
    where
        tokenizerConfig :: TokenizerConfig
        tokenizerConfig = TokenizerConfig
            { traceOutput = False
            , syntaxMap = defaultSyntaxMap
            }

Running this results in the following output:

$ ./test.hs
[[(NormalTok,"a"),(OperatorTok,"`"),(OtherTok,"b`"),(NormalTok,"c")]]

But I expected the output to be [[(NormalTok,"a"),(OperatorTok,"`b`"),(NormalTok,"c")]]. It seems it takes the OperatorTok from this syntax file, but the OtherTok from another one.

Minimal example with copy of haskell.xml with name "test" in test.xml

When I copy the contents of haskell.xml into test.xml and change the name-attribute of language to Test, it doesn't work at all: test.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE language SYSTEM "language.dtd">
<!-- I changed the attribute name here -->
<language name="Test" version="4" kateversion="3.4" section="Sources" extensions="*.hs;*.chs" mimetype="text/x-haskell" author="Nicolas Wu (zenzike@gmail.com)" license="LGPL" indenter="haskell" style="haskell">
  <highlighting>
<!-- ..... rest is the same ..... -->
  </highlighting>
</language>

test.hs

#!/usr/bin/env cabal
{- cabal:
    build-depends: base, text, skylighting, skylighting-core
-}

import Data.Text (pack)
import Skylighting (tokenize, TokenizerConfig(..), defaultSyntaxMap)
import Skylighting.Loader (loadSyntaxFromFile)

main :: IO ()
main = do
    Right syntax <- loadSyntaxFromFile "./test.xml"
    case tokenize tokenizerConfig syntax $ pack "a`b`c" of
        Right sourceLines -> print sourceLines
        Left e -> putStrLn e
    where
        tokenizerConfig :: TokenizerConfig
        tokenizerConfig = TokenizerConfig
            { traceOutput = False
            , syntaxMap = defaultSyntaxMap
            }
$ ./test.hs
Unknown syntax or context: ("Test","infix")

My assumption is that <X context="foo" /> tries to find the context foo in a compiled syntax with the name specified in the name of the language-attribute instead of in the same xml-file.

jgm commented 4 days ago
                <DetectChar attribute="String" context="string" char="&quot;"/>

try context="String". It is case-sensitive.

LiraOnGithub commented 3 days ago

Both the name of the context as the value of context are all lowercase, but if I change all occurences of string to String it still yields the same result

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE language [ ]>
<language name="Test" version="1" kateversion="5.53" section="Sources" extensions="*.*" mimetype="text/plain" license="LGPL">
    <highlighting>
        <contexts>
            <context attribute="Normal" lineEndContext="#stay" name="normal">
                <DetectChar attribute="String" context="String" char="&quot;"/>
            </context>
            <context attribute="String" lineEndContext="#stay" name="String">
                <DetectChar attribute="String" context="#pop" char="&quot;" />
            </context>
        </contexts>
        <itemDatas>
            <itemData name="Normal" defStyleNum="dsNormal" />
            <itemData name="String" defStyleNum="dsString" />
        </itemDatas>
    </highlighting>
</language>
#!/usr/bin/env cabal
{- cabal:
    build-depends: base, text, skylighting, skylighting-core
-}

import Data.Text (pack)
import Skylighting (tokenize, TokenizerConfig(..), defaultSyntaxMap)
import Skylighting.Loader (loadSyntaxFromFile)

main :: IO ()
main = do
    Right syntax <- loadSyntaxFromFile "./test.xml"
    case tokenize tokenizerConfig syntax $ pack "a\"b\"c" of
        Right sourceLines -> print sourceLines
        Left e -> putStrLn e
    where
        tokenizerConfig :: TokenizerConfig
        tokenizerConfig = TokenizerConfig
            { traceOutput = False
            , syntaxMap = defaultSyntaxMap
            }
$ ./test.hs
Unknown syntax or context: ("Test","String")

Also, if that were the case the second and third case would still work, but those fail as well (the ones with haskell.xml and the copy of it)

I think it looks for the context in the wrong place (not the file itself)

jgm commented 3 days ago
        tokenizerConfig :: TokenizerConfig
        tokenizerConfig = TokenizerConfig
            { traceOutput = False
            , syntaxMap = defaultSyntaxMap
            }

I think the problem might be this: syntaxMap is defaultSyntaxMap, which doesn't include the new syntax you've created.

See https://hackage.haskell.org/package/skylighting-core-0.14.3/docs/Skylighting-Parser.html#v:addSyntaxDefinition

LiraOnGithub commented 3 days ago

Yes! That was the issue! Thank you so much :D