highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.59k stars 3.58k forks source link

Discuss: (js/ts) JSX/TSX support roll-up thread #2998

Open joshgoebel opened 3 years ago

joshgoebel commented 3 years ago

This is a high-level thread created to track the overall discussion of our JSX/TSX support.

Support for JSX/TSX is not a single thing. We already include "basic" support for embedded XML/HTML fragments using the xml sublanguage. We do not support comments or embedded JS.

This has all been attempted several times and is a very, very hard problem. It may actually be impossible for us to solve for various reasons.

  1. Technical: we simply may not have the necessary tools in our parsing engine. We have very limited tools for man again state since our desire is to AVOID state as much as possible.
  2. Philosophical: even if we have the tools we purposely do not build full grammar parsers (out of scope), we only do smart pattern matching.
  3. Philosophical: (again) we must highlight incomplete/incorrect code snippets to the best of our ability - we must highlight partial code as well as code out of context. A full context-aware parser creates issues here also.

Any solution here needs to be simple enough to maintain long term and not violate our core philosophy. Right now our limited support is fairly simple (all things considered). I'm open to being shown this is a much easier problem than I imagine it to be.

This thread exists to talk about the high level problem/approach instead of the individual mini-problems. Or perhaps the whole discussion will move here since many of these items are intermingled and hard to handle in isolation.

joshgoebel commented 3 years ago

Pulling this form discussion on https://github.com/highlightjs/highlight.js/pull/2412.


I'm pretty sure I became persuaded sublanguage is NOT the correct way to handle this. JSX is NOT simply XML embedded inside Javascript (as it's handled now). It's literally an extension to the JS language that also allows JS line comments, inline code, etc... this means the proper way to do it is to treat it as HTML tags within the JS language/grammar.

The problem then becomes you need an actual syntax parser to track the state, count tags (so you know when you leave the JSX fragment and are back in JS land - also accounting for comments, inline JS, JSX inside inline JS, etc)... and full blown parsing is not what we do.

This is made harder by the fact that JS/TS is supported by a single underlying grammar. Plus TSX is also a thing as well. So a proper solution here has to be able to distinguish between type annotations/casting Array<number> and embedded JSX. Hence the current hasClosingTag logic...