helix-editor / helix

A post-modern modal text editor.
https://helix-editor.com
Mozilla Public License 2.0
32.79k stars 2.42k forks source link

Add scdoc syntax highlight #7707

Open kchibisov opened 1 year ago

kchibisov commented 1 year ago

This is widely used for man pages nowadays.

https://git.sr.ht/~sircmpwn/scdoc/

markstos commented 7 months ago

For someone interested in this, see the docs on adding a new language:

https://docs.helix-editor.com/guides/adding_languages.html

The primary thing to add would a new file at runtime/queries/scdoc/highlights.scm

See the Treesitter docs on defining new highlights: https://tree-sitter.github.io/tree-sitter/syntax-highlighting#queries

For a formal definition of the SCDOC file format. you can see this:

https://man.archlinux.org/man/scdoc.5.en

For example from a similar file format, See how Markdown highlights are defined:

https://github.com/helix-editor/helix/blob/master/runtime/queries/markdown/highlights.scm

rusty-snake commented 3 weeks ago

As workaround until helix gets scdoc support, you can run ~:languages-open~ :open ~/.config/helix/languages.toml and add the following to get rid of the supercollider highlighting:

[[language]]
name = "scdoc"
scope = "text.scdoc"
# file-types = [".1.scd", ".2.scd", ".3.scd", ".4.scd", ".5.scd", ".6.scd", ".7.scd", ".8.scd"]
file-types = ["scd"]
comment-token = "; "
indent = { tab-width = 4, unit = "\t" }
text-width = 80
rusty-snake commented 1 week ago

@markstos is only runtime/queries/scdoc/highlights.scm and languages.toml (see above) necessary? Or do we need some kind of grammar. Because I don't understand from where the node/token names stem.

The helix/tree-sitter highlights.scm system does not seem to build a master-regex an give it the file but to give highlights.scm a tokenized input. How is helix supposed to magically know the syntax of scdoc?

markstos commented 1 week ago

@rusty-snake I think you are write, a tree sitter parser and grammar is required first, as documented here:

https://tree-sitter.github.io/tree-sitter/creating-parsers https://tree-sitter.github.io/tree-sitter/creating-parsers#writing-the-grammar

I asked Gemini it could write an scdoc gammer based on the scdoc man page and here's what it came up with. I have not attempted to validate if this is useful or a hallucination. But it only took the AI about 5 seconds to generate!

Scdoc = _Preamble Section*
_Preamble = name "(" section ")" _Footer?
name = identifier
section = identifier
_Footer = string_literal (_center_header string_literal)?
_center_header = "center_header " string_literal

Section = header (_content | indented_content)?
header = "#" identifier
_content = paragraph*
indented_content = indented_line*

paragraph = line_break*

line_break = "+" "+"
line = !(header | indented_line) text?

indented_line = "\t" content

content = text | list | table | formatted_text | literal_block | comment

text = /(?!\+|\#|\;|\\|\n)/+

list = (bullet_list | numbered_list)
bullet_list = "-" line_break* line+
numbered_list = "." line_break* line+

table = (bordered_table | unbordered_table | headered_table)
bordered_table = "[" table_row* "]"
unbordered_table = "|" table_row* "|"
headered_table = "]" table_row* "["
table_row = (cell | continued_cell)* line_break
cell = alignment content
continued_cell = " " content
alignment = ("[" | ":" | "]") (">" | "=" | "<" | text)?

formatted_text = "*" text "*" | "_" text "_"

literal_block = "\\`" (!("\\`"))* "\\```"
comment = ";" !"[\n]+"

identifier = /[a-zA-Z0-9_-]*/
string_literal = "\"" (!"\""))* "\""