Open cygri opened 5 years ago
As a maintainer of a Highlight.js language grammar you might be interested in the discussion of an official packaging format. I just created an issue to track the discussion and I've been working on this along with the new build system.
https://github.com/highlightjs/highlight.js/issues/2328
Sorry for the spam, but I couldn't think of an easier way to ping the people who might be most interested in weighing in on the subject. Feel free to simply close this issue or leave it open (whatever works best for you!).
@cygri: Files to migrate:
Then I'd just use follow https://github.com/highlightjs/highlightjs-shexc blindly as example to refactor them.
I've been working with @yyyc514 on a branch which supports Turtle, Trig, ShEx and SPARQL. It keeps track of the part of speech with markup like:
<span class="hljs-rdf-predicate"><span class="type">foo:</span><span class="name">bar</span></span>.
I've prototyped that in ShEx but because they all share a core library, it should be applicable to Turtle and Trig (so graph names don't get lost in a wash of pnames).
@ericprud I looked at it and the approach is promising (will save duplication of IRIs and literals). If we go with this approach, then I'd also put https://github.com/highlightjs/highlightjs-pie in there.
But there seems to be a lot missing, eg blank nodes, datatyped literals, langString literals...
@cygri Do you think we should have 1 repo with 4 langs, or keep them separate?
Not sure anyone was asking me... but in general I prefer separate repos but honestly it comes down to what's easiest for the maintainers as we need maintainer time more than we need perfect repo organization. :-) So if the'll be better maintained in a single place... then more power to ya.
We also don't have a pattern yet for naming multi-language repos, but I suppose we'll figure that out.
Is this still being worked on or could this repo be removed? Doing some early spring cleaning.
I mentioned a branch I was working on above. I got stuck on adding positional dependencies to the highlighting class. For example:
<URL1> <URL1> <URL1> .
would have a different class for the middle <URL1>
. I'll call the middle class p
and the others n
. In a more complete example, the first node is an n
, the second a p
, third an n
. After '.' you go back to the first n
, after ','s you go back to n
and after ';'s you go back to p
, e.g.:
<n> <p> <n> .
<n> <p> <n> , <n> ; <p> <n> ; <p> <n> , <n> .
(spaces around punctuation not needed; added here for emphasis)
If it were easier to give the 1st and 3rd n
s different names, that'd be grand, e.g. s
and o
in:
<s> <p> <o> .
<s> <p> <o> , <o> ; <p> <o> ; <p> <o> , <o> .
You gave me some guidance (under your @yyyc514 moniker) but I never got it working. If you have time for some real-time collaboration, maybe we can knock this out quickly.
Related:
I can't recall what I told you before. You're having issues because generally we just currently don't support that [well or at all]. We generally don't do "sequence of things" where each thing must be highlighted differently. The most advanced examples we have in core are in latex.js
if you want to have a look. This is not something I try to really encourage though (as it's very hard to debug and reason about). "Here there be dragons".
starts
and contains
are the only tools we have in our belt, and it gets complex fast (double so if you're trying to match a repeated pattern differently like it seems in this case) - and often you'd want to use endsParent
to terminate/roll up a chain but the two items aren't compatible (endsParent
prevents starts
from working).
You could also implement something entirely crazy with the on:begin
callback for a match... have it manually look forwards and backwards into the code to figure out it's context... but there is no way for a mode to change it's className... so you'd have multiple modes all doing the checking and then claiming (or rejecting) matches... I don't think it would be much fun (or very fast).
Honestly it'd be more interesting to discuss supporting entirely custom parsing engines and what that might look like... ie, you'd write your OWN Turtle parser/tokenizer in JS (or use an existing one!). And then we just have some minimal "glue" wrt Highlight.js so that we know content/tags/classes to split out in the final output.
Obviously every grammar having a full parser would be very heavy sizewise (not to mention all the potential bugs), which is why it would never make any real sense for core, but for specialized 3rd party grammars where you really want to highlight ONE grammar really well it might not be a big deal at all to pull in a full parser and then just pipe those tokens back into HLJS.
I'm intrigued by driving this from a grammar. I've just templatized jison and jison-lex to be typescript-friendly (ts-jison, ts-jison-lex), which means we could substitute whatever template made it work well with highlight.js. We could look up the vstack to see when we're in some production and the lstack to synchronize locations.
(If you're a jison/bison hacker, you've seen the vstack as $1..$n in your semantic actions. Because the ll or lr parser is reducing the n last terms on the stack, $1..$n give you access to them by lexically translating those to indexes on the vstack (AKA $$
). If you over-run that index, you can see not just what you're supposed to be reducing, but how you got there.)
I'm not familiar with those projects.
Here is sort of what I very roughly imagined (just OTTOMH):
customParse
key to the language definition which is a function fn(code, emitter)
__emitter
(see code for TokenTreeEmitter
)Emitter
API to communication with Highlight.js about tokens, text, etc.The end of the highlight process would be no different than usual, cleaning up the emitter.
emitter.closeAllNodes();
emitter.finalize();
result = emitter.toHTML();
The emitter API is documented in code (token_tree.js
). You'd likely only be using addKeyword
, addText
, and open/closeNode
.
This is technically all private API but if we decided to open this up we'd likely need to make it public.
To implement this you'd just add an if/else block inside _highlight
to dispatch to customParse
- and of course have to implement the customParse
function. If you're using some sort of generator then I imagine you need the JS that produces plus a thin layer of glue to make that interface with our Emitter
, which is all the customParse
function would be. In a perfect world using an existing/generated parser the glue code might be only ~50-100 few lines. (you'll need to map token types to HLJS classes, etc)
Any progress or should this repo be archived or removed? Doing a bit of spring cleaning.
hi @joshgoebel and @ericprud! I'll make a PR with turtle, sparql, pie (and maybe shaclc). And I'll try to get a colleague to fix details (eg tests).
@cygri @ericprud @joshgoebel looking for volunteers to:
So I guess we need to migrate code into this repository and make it fit the new way of extending highlight.js.
The starting point, I guess, is the code here: https://github.com/highlightjs/highlight.js/pull/1844
The endpoint should be something along the lines of what we can see here: https://github.com/highlightjs/highlightjs-solidity
How do we organise this? Who takes a first stab at making it work?