syntect
is a syntax highlighting library for Rust that uses Sublime Text syntax definitions.
It aims to be a good solution for any Rust project that needs syntax highlighting, including deep integration with text editors written in Rust.
It's used in production by at least two companies, and by many open source projects.
If you are writing a text editor (or something else needing highlighting) in Rust and this library doesn't fit your needs, I consider that a bug and you should file an issue or email me. I consider this project mostly complete, I still maintain it and review PRs, but it's not under heavy development.
syntect
is available on crates.io. You can install it by adding this line to your Cargo.toml
:
syntect = "5.0"
After that take a look at the documentation and the examples.
If you've cloned this repository, be sure to run
git submodule update --init
to fetch all the required dependencies for running the tests.
<pre>
tags or 24-bit colour ANSI terminal escape sequences.There's currently an example program called syncat
that prints one of the source files using hard-coded themes and syntaxes using 24-bit terminal escape sequences supported by many newer terminals.
These screenshots don't look as good as they could for two reasons:
first the sRGB colours aren't corrected properly, and second the Rust syntax definition uses some fancy labels that these themes don't have highlighting for.
Prints highlighted lines of a string to the terminal. See the easy and html module docs for more basic use case examples.
use syntect::easy::HighlightLines;
use syntect::parsing::SyntaxSet;
use syntect::highlighting::{ThemeSet, Style};
use syntect::util::{as_24_bit_terminal_escaped, LinesWithEndings};
// Load these once at the start of your program
let ps = SyntaxSet::load_defaults_newlines();
let ts = ThemeSet::load_defaults();
let syntax = ps.find_syntax_by_extension("rs").unwrap();
let mut h = HighlightLines::new(syntax, &ts.themes["base16-ocean.dark"]);
let s = "pub struct Wow { hi: u64 }\nfn blah() -> u64 {}";
for line in LinesWithEndings::from(s) {
let ranges: Vec<(Style, &str)> = h.highlight_line(line, &ps).unwrap();
let escaped = as_24_bit_terminal_escaped(&ranges[..], true);
print!("{}", escaped);
}
Currently syntect
is one of the faster syntax highlighting engines, but not the fastest. The following perf features are done:
<script>
tags) so there are no tree traversal string lookups in the hot-pathThe current perf numbers are below. These numbers may get better if more of the things above are implemented, but they're better than many other text editors. All measurements were taken on a mid 2012 15" retina Macbook Pro, my new 2019 Macbook takes about 70% of these times.
98ms
(highlighting only, takes ~200ms
click to pixels), despite having a super fancy javascript syntax definition.~138ms
to load and link all the syntax definitions in the default Sublime package set.
~23ms
to load and link all the syntax definitions from an internal pre-made binary dump with lazy regex compilation.~1.9ms
to parse and highlight the 30 line 791 character testdata/highlight_test.erb
file. This works out to around 16,000 lines/second or 422 kilobytes/second.~250ms
end to end for syncat
to start, load the definitions, highlight the test file and shut down.
This is mostly spent loading.Syntect makes heavy use of cargo features, to support users who require only a subset of functionality.
In particular, it is possible to use the highlighting component of syntect without the parser (for instance when hand-rolling a higher performance parser for a particular language), by adding default-features = false
to the syntect entry in your Cargo.toml
.
For more information on available features, see the features section in Cargo.toml
.
fancy-regex
mode, without onig
Since 4.0 syntect
offers an alternative pure-rust regex engine based on the fancy-regex engine which extends the awesome regex crate with support for fancier regex features that Sublime syntaxes need like lookaheads.
The advantage of fancy-regex
is that it does not require the onig crate which requires building and linking the Oniguruma C library. Many users experience difficulty building the onig
crate, especially on Windows and Webassembly.
As far as our tests can tell this new engine is just as correct, but it hasn't been tested as extensively in production. It also currently seems to be about half the speed of the default Oniguruma engine, although further testing and optimization (perhaps by you!) may eventually see it surpass Oniguruma's speed and become the default.
To use the fancy-regex engine with syntect, add it to your Cargo.toml
like so:
syntect = { version = "4.2", default-features = false, features = ["default-fancy"]}
If you want to run examples with the fancy-regex engine you can use a command line like the following:
cargo run --features default-fancy --no-default-features --release --example syncat testdata/highlight_test.erb
Due to the way Cargo features work, if any crate you depend on depends on syntect
without enabling fancy-regex
then you'll get the default onig
mode.
Note: The fancy-regex
engine is absurdly slow in debug mode, because the regex engine (the main hot spot of highlighting) is now in Rust instead of C that's always built with optimizations. Consider using release mode or onig
when testing.
Because syntect
's API exposes internal cacheable data structures, there is a caching strategy that text editors can use that allows the text on screen to be re-rendered instantaneously regardless of the file size when a change is made after the initial highlight.
Basically, on the initial parse every 1000 lines or so copy the parse state into a side-buffer for that line. When a change is made to the text, because of the way Sublime Text grammars work (and languages in general), only the highlighting after that change can be affected. Thus when a change is made to the text, search backwards in the parse state cache for the last state before the edit, then kick off a background task to start re-highlighting from there. Once the background task highlights past the end of the current editor viewport, render the new changes and continue re-highlighting the rest of the file in the background.
This way from the time the edit happens to the time the new colouring gets rendered in the worst case only 999+length of viewport
lines must be re-highlighted.
Given the speed of syntect
even with a long file and the most complicated syntax and theme this should take less than 100ms.
This is enough to re-highlight on every key-stroke of the world's fastest typist in the worst possible case.
And you can reduce this asymptotically to the length of the viewport by caching parse states more often, at the cost of more memory.
Any time the file is changed the latest cached state is found, the cache is cleared after that point, and a background job is started. Any already running jobs are stopped because they would be working on old state. This way you can just have one thread dedicated to highlighting that is always doing the most up-to-date work, or sleeping.
Since 3.0, syntect
can be used to do parsing/highlighting in parallel.
SyntaxSet
is both Send
and Sync
and so can easily be used from multiple threads.
It is also Clone
, which means you can construct a syntax set and then clone it to use for other threads if you prefer.
Compared to older versions, there's nothing preventing the serialization of a SyntaxSet
either.
So you can directly deserialize a fully linked SyntaxSet
and start using it for parsing/highlighting.
Before, it was always necessary to do linking first.
It is worth mentioning that regex compilation is done lazily only when the regexes are actually needed.
Once a regex has been compiled, the compiled version is used for all threads after that.
Note that this is done using interior mutability, so if multiple threads happen to encounter the same uncompiled regex at the same time, compiling might happen multiple times.
After that, one of the compiled regexes will be used.
When a SyntaxSet
is cloned, the regexes in the cloned set will need to be recompiled currently.
For adding parallelism to a previously single-threaded program, the recommended thread pooling is rayon
.
However, if you're working in an already-threaded context where there might be more threads than you want (such as writing a handler for an Iron request), the recommendation is to force all highlighting to be done within a fixed-size thread pool using rust-scoped-pool
.
An example of the former is in examples/parsyncat.rs
.
There's a number of examples of programs that use syntect
in the examples
folder and some code outside the repo:
syncat
prints a highlighted file to the terminal using 24-bit colour ANSI escape codes.
It demonstrates a simple file highlighting workflow.synhtml
prints an HTML file that will display the highlighted code.
Demonstrates how syntect could be used by web servers and static site generators.synstats
collects a bunch of statistics about the code in a folder.
Includes basic things like line count but also fancier things like number of functions.
Demonstrates how syntect
can be used for code analysis as well as highlighting, as well as how to use the APIs to parse out the semantic tokenization.faiyels
is a little code minimap visualizer I wrote that uses syntect
for highlighting.parsyncat
is like syncat
, but accepts multiple files and highlights them in parallel.
It demonstrates how to use syntect
from multiple threads.Here's that stats that synstats
extracts from syntect
's codebase (not including examples and test data) as of this commit:
################## Stats ###################
File count: 19
Total characters: 155504
Function count: 165
Type count (structs, enums, classes): 64
Code lines (traditional SLOC): 2960
Total lines (w/ comments & blanks): 4011
Comment lines (comment but no code): 736
Blank lines (lines-blank-comment): 315
Lines with a documentation comment: 646
Total words written in doc comments: 4734
Total words written in all comments: 5145
Characters of comment: 41099
Below is a list of projects using Syntect, in approximate order by how long they've been using syntect
(feel free to send PRs to add to this list):
cat(1)
clone, uses syntect
for syntax highlighting.syntect
for syntax highlighting. syntect
for code blocks.syntect
for highlighting code snippets.syntect
for code blocks.syntect
for highlighting.syntect
for highlighting code blocks.syntect
for code blocks.syntect
for highlighting.syntect
for shell colouring.syntect
for highlighting.syntect
for highlighting code snippets.syntect
for highlighting.syntect
for file previews.syntect
for code blocks.syntect
for highlighting code blocks.syntect
for fenced code blocks.syntect
for text file previews.syntect
for code syntax highlighting.Thanks to Robin Stocker, Keith Hall and Martin Nordholts for making awesome substantial contributions of the most important impressive improvements syntect
has had post-v1.0
!
They deserve lots of credit for where syntect
is today. For example @robinst implemented fancy-regex support and a massive refactor to enable parallel highlighting using an arena. @keith-hall found and fixed many bugs and implemented Sublime syntax test support.
Thanks to Textmate 2 and @defuz's sublimate for the existing open source code I used as inspiration and in the case of sublimate's tmTheme
loader, copy-pasted.
All code (including defuz's sublimate code) is released under the MIT license.