trishume / syntect

Rust library for syntax highlighting using Sublime Text syntax definitions.
https://docs.rs/syntect
MIT License
1.93k stars 139 forks source link

Avoid unnecessary repetition of CSS classes #312

Open ghost opened 4 years ago

ghost commented 4 years ago

I plan to use this library for highlighting content on webpages and comments. I need the classed output, so the actual colors can be controlled with a stylesheet at runtime, but I'm concerned about its size. I entered merely:

if True:
 print(os.stdout)

And got:

span class="source python"><span class="meta statement if python"><span class="keyword control flow conditional python">if</span> <span class="constant language python">True</span><span class="punctuation section block conditional python">:</span></span>
 <span class="meta function-call python"><span class="meta qualified-name python"><span class="support function builtin python">print</span></span><span class="punctuation section arguments begin python">(</span><span class="meta function-call arguments python"><span class="meta qualified-name python"><span class="meta generic-name python">os</span><span class="punctuation accessor dot python">.</span><span class="meta generic-name python">stdout</span></span></span><span class="punctuation section arguments end python">)</span></span>
</span>

Is there a way to trim this output by not including some of the classes? For example, I don't need the language name on every token, or most of the other classes.

keith-hall commented 4 years ago

While it would be possible to create a new ClassedHtmlGenerator to strip off the last scope atom from the output at runtime, did you consider that it would be more performant and not require any changes in syntect itself if you were to achieve this by modifying the .sublime-syntax files (with a simple find and replace on scope: and captures: YAML nodes to remove the language suffix scope atoms there? (along with any other scope specialisations you don't desire, I guess like meta scopes don't generally get targeted by color schemes much)

ghost commented 4 years ago

Do you mean the .sublime-syntax files that are populated in testdata after git submodule update --init?

I'm not sure how I would do that in a project using syntect as a dependency. In particular, I have a Crystal project and hope to use syntect via FFI (I have written a small wrapper in Rust to make it FFI-compatible), so I don't know how I would build syntect with modified syntax files.

By "create a new ClassedHtmlGenerator", do you mean for me to implement it out of more primitive syntect features? Because if it's going to be that complicated, I'm not sure it's worth it. I only came to syntect because I couldn't find a sufficient native-Crystal highlighter.

keith-hall commented 4 years ago

Do you mean the .sublime-syntax files that are populated in testdata after git submodule update --init?

Yes, those.

I don't know how I would build syntect with modified syntax files

It could be a solution to build syntect without the assets feature, and to use the public API to load a separate "syntax set dump" file which can be created either through the public API also, or potentially using the syntect examples directly, like is done for CI in this repo.

By "create a new ClassedHtmlGenerator", do you mean for me to implement it out of more primitive syntect features?

I was thinking some small tweaks to the existing one would suffice, but cloned so as not to affect performance for all users...

ghost commented 4 years ago

It could be a solution to build syntect without the assets feature, and to use the public API to load a separate "syntax set dump" file which can be created either through the public API also, or potentially using the syntect examples directly, like is done for CI in this repo.

I see... I guess I could do that, but it seems too flaky to be worth it. One of the criteria I had in searching for a highlighter was that I didn't want my code to be directly concerned with the actual syntax rules.

I was thinking some small tweaks to the existing one would suffice, but cloned so as not to affect performance for all users...

That sounds great, but I don't see any info in the documentation about how to modify the existing one as a user of the library.

keith-hall commented 4 years ago

That sounds great, but I don't see any info in the documentation about how to modify the existing one as a user of the library.

As a pure user, its not currently possible - some Rust code changes would be required.

Maybe a pure js highlighter implementation would meet your requirements better? i. e. Prism.JS or the like maybe

trishume commented 4 years ago

I'd do it a different way that I think is more principled. Basically you can do something similar to the highlighting step, which matches on scope stacks to map to colors, but instead match on scope stack to map to a single CSS class. Probably attempting to match the CSS classes used by Pygments since there's already lots of CSS themes for that.

The easiest way to implement this is similar to how I recommended bat implement ANSI color highlighting, which is to just make a special theme definition which uses the red channel or something as a color ID: https://github.com/sharkdp/bat/pull/543. Then map the color ID to a CSS class name with a constant array. You can include a binary version of this theme inside your binary using similar code to how syntect bundles asset dumps, where you dump a binary version of the theme and use include_bytes.

The alternative way is to basically implement something like the logic in highlighter.rs yourself (it only uses public APIs) or the synstats.rs example. Match on scope selectors and use that to determine CSS classes.

I'm very unlikely to get around to implementing this myself but if anyone implements one that uses Pygments-compatible classes I'd accept a PR for it.