Open tsenart opened 3 years ago
Heads up @tsenart - the "team/cloud" label was applied to this issue.
Thanks for filing! This has come up a few times in the past but I haven't had a place I can refer back to due to Slack message retention..
In general, its not a bad idea, definitely something I have considered multiple times (including recently), but it would be a fairly hefty amount of work to make it mostly competitive with the server-side implementation, and even then there are some fundamental limitations (like needing to transfer the whole file to the client for highlighting) I'm not sure we could overcome. Overall, I'm not sure it is worth it.
$ ls -lah ./assets/*
-rw-r--r-- 1 slimsag staff 7.5K Oct 16 17:21 ./assets/default.themedump
-rw-r--r-- 1 slimsag staff 5.1K Oct 16 17:21 ./assets/default_metadata.packdump
-rw-r--r-- 1 slimsag staff 533K Oct 16 17:21 ./assets/default_newlines.packdump
lazy_static!
to ensure we only parse theme files once, and only compile the (enormous) amount of regexps found in each syntax definition - I think we could replicate this on the frontend too by keeping the compiled forms around in memory given we are a SPA - but my TS knowledge limits me here in terms of what is possible, especially when it comes to webworkers being mixed in.syntect
where it would be more appropriate. In specific some things we would have to deal with:
<table>
and Syntect gives us <span>
s, so we do this permutation hereThanks for the extensive notes! Do you think it's unsolvable to make syntax highlighting work without the whole file? Could we not find the most immediate context boundary (e.g. like comby finds delimiters) and send that over?
On another note: Could we not be well served by what VS Code uses for syntax highlighting?
Do you think it's unsolvable to make syntax highlighting work without the whole file? Could we not find the most immediate context boundary (e.g. like comby finds delimiters) and send that over?
That'd be possible, good point - though it's not what Syntect does out of the box, we'd have to do that for each language.
On another note: Could we not be well served by what VS Code uses for syntax highlighting?
VS Code uses an interesting hodge-podge of options for syntax highlighting:
.tmLanguage
grammer or .sublime-syntax
file we can just import and use, they get that info, and theme info, via editor extensions. I am sure we could do this, but not sure how hairy the details are there.If you want some info about other options I considered back in 2018, there is a write-up here: https://news.ycombinator.com/item?id=17932653
Syntax highlighting is hard, and old. Either the languages supported are minuscule, or you're falling back to TextMate grammers and something more complex (semantic highlighting, sublime syntaxes) regularly.
cc @camdencheek
@felixfbecker I actually spent a day or so looking into this! I got as far as getting highlighting running in the browser. Ultimately, performance wasn't great with the pure-rust regex crate. It would take >10s for large files. At some point, I'd like to see if I can get onig
compiling to WASM for better perf, but I'd hit my timebox for the wasm highlighting spike.
Having spent way too much time on this myself (and also after having gotten Syntect to run in the browser with not-great performance results) I eventually landed on shiki which I can highly recommend.
@schickling thanks for the recommendation!
When we initially implement syntect_server, WASM support was limited, but that has since changed. Could we do syntax highlighting purely in the frontend, still using syntect, but using WASM? Rust has excellent WASM support.