jgm / skylighting

A Haskell syntax highlighting library with tokenizers derived from KDE syntax highlighting descriptions
195 stars 63 forks source link

Stata highlighting wrongly highlights keywords contained in comments #185

Open arthur-shaw opened 9 months ago

arthur-shaw commented 9 months ago

In Stata's built-in code editor, all text contained in a comment is highlighted as a comment. For example:

image

In Pandoc outputs, plain text in comments that also happens to be Stata keywords is highlighted as a keyword. For example:

image

Note that adopath, BASE, means, and until--all incorrectly highlighted words following comments--appear as words to highlight in stata.xml.

When inspecting the HTML output, one can see the keyword class being applied. Furthermore, the text after comments appears to be "tokenized", and each "token" gets a different highlighting style depending what class it belongs to (e.g., keyword, list of commands, etc.)

image

Here's how I produced the HTML output in the last two images above.

  1. Create a Markdown file
---
title: Hello
---

Here's some Stata code:

```stata
* Set user root folder
global root "C:\Users\user123\github\myproject"

* Set PLUS to adopath and list it first, then list BASE first.
* This means that BASE is first and PLUS is second.
adopath ++  "${root}/code/ado"
adopath ++  BASE

* Keep removing adopaths with rank 3 until only BASE and the project ado-folder,
* that has rank 1 and 2, are left in the adopaths
local morepaths 1
while (`morepaths' == 1) {
  capture adopath - 3
  if _rc local morepaths 0
}

2. Render as HTML with Pandoc

pandoc stata_test.md -f markdown -t html -s -o stata_test.html



Note: I've not (yet) investigated whether this issue also arises for other comments (e.g., single-line comments starting with `//`, end-of-line comments with `///`, or multi-line comments starting with `/*` and ending with `*/`).

Sorry if I'm posting this in the wrong place, or providing less than helpful information.

`skylighting` is a really amazing tool. I'm coming to it from a project that uses [Quarto](https://quarto.org/) to write HTML documentation for Stata packages.
jgm commented 9 months ago

I'm seeing the same behavior when I open the file with the Kate editor. This indicates that the problem is in the stata.xml syntax definition from KDE, which skylighting is interpreting accurately. Try submitting a report there (see our README for some links).

arthur-shaw commented 9 months ago

Many thanks for pointing me in the right direction.

If this gets fixed upstream, should I notify you here? Or do you periodically pull new/improved syntax definitions from upstream?

jgm commented 9 months ago

Wouldn't hurt to notify here. But yes, every once in a while we pull in changes from upstream.