Migrate code - Githubissues

cygri commented 5 years ago

So I guess we need to migrate code into this repository and make it fit the new way of extending highlight.js.

The starting point, I guess, is the code here: https://github.com/highlightjs/highlight.js/pull/1844

The endpoint should be something along the lines of what we can see here: https://github.com/highlightjs/highlightjs-solidity

How do we organise this? Who takes a first stab at making it work?

joshgoebel commented 4 years ago

As a maintainer of a Highlight.js language grammar you might be interested in the discussion of an official packaging format. I just created an issue to track the discussion and I've been working on this along with the new build system.

https://github.com/highlightjs/highlight.js/issues/2328

Sorry for the spam, but I couldn't think of an easier way to ping the people who might be most interested in weighing in on the subject. Feel free to simply close this issue or leave it open (whatever works best for you!).

VladimirAlexiev commented 4 years ago

@cygri: Files to migrate:

Then I'd just use follow https://github.com/highlightjs/highlightjs-shexc blindly as example to refactor them.

Need to add test/markup
Could also add a demo following https://github.com/highlightjs/highlightjs-shexc/blob/master/index.html

ericprud commented 4 years ago

I've been working with @yyyc514 on a branch which supports Turtle, Trig, ShEx and SPARQL. It keeps track of the part of speech with markup like:

<span class="hljs-rdf-predicate"><span class="type">foo:</span><span class="name">bar</span></span>.

I've prototyped that in ShEx but because they all share a core library, it should be applicable to Turtle and Trig (so graph names don't get lost in a wash of pnames).

VladimirAlexiev commented 4 years ago

@ericprud I looked at it and the approach is promising (will save duplication of IRIs and literals). If we go with this approach, then I'd also put https://github.com/highlightjs/highlightjs-pie in there.

But there seems to be a lot missing, eg blank nodes, datatyped literals, langString literals...

@cygri Do you think we should have 1 repo with 4 langs, or keep them separate?

joshgoebel commented 4 years ago

Not sure anyone was asking me... but in general I prefer separate repos but honestly it comes down to what's easiest for the maintainers as we need maintainer time more than we need perfect repo organization. :-) So if the'll be better maintained in a single place... then more power to ya.

We also don't have a pattern yet for naming multi-language repos, but I suppose we'll figure that out.

joshgoebel commented 3 years ago

Is this still being worked on or could this repo be removed? Doing some early spring cleaning.

ericprud commented 3 years ago

I mentioned a branch I was working on above. I got stuck on adding positional dependencies to the highlighting class. For example: <URL1> <URL1> <URL1> . would have a different class for the middle <URL1>. I'll call the middle class p and the others n. In a more complete example, the first node is an n, the second a p, third an n. After '.' you go back to the first n, after ','s you go back to n and after ';'s you go back to p, e.g.:

<n> <p> <n> .
<n> <p> <n> , <n> ; <p> <n> ; <p> <n> , <n> .

(spaces around punctuation not needed; added here for emphasis)

If it were easier to give the 1st and 3rd ns different names, that'd be grand, e.g. s and o in:

<s> <p> <o> .
<s> <p> <o> , <o> ; <p> <o> ; <p> <o> , <o> .

You gave me some guidance (under your @yyyc514 moniker) but I never got it working. If you have time for some real-time collaboration, maybe we can knock this out quickly.

joshgoebel commented 3 years ago

https://github.com/highlightjs/highlight.js/issues/2838
https://github.com/highlightjs/highlight.js/pull/2776 (lots of good reading about chaining)

I can't recall what I told you before. You're having issues because generally we just currently don't support that [well or at all]. We generally don't do "sequence of things" where each thing must be highlighted differently. The most advanced examples we have in core are in latex.js if you want to have a look. This is not something I try to really encourage though (as it's very hard to debug and reason about). "Here there be dragons".

starts and contains are the only tools we have in our belt, and it gets complex fast (double so if you're trying to match a repeated pattern differently like it seems in this case) - and often you'd want to use endsParent to terminate/roll up a chain but the two items aren't compatible (endsParent prevents starts from working).

joshgoebel commented 3 years ago

You could also implement something entirely crazy with the on:begincallback for a match... have it manually look forwards and backwards into the code to figure out it's context... but there is no way for a mode to change it's className... so you'd have multiple modes all doing the checking and then claiming (or rejecting) matches... I don't think it would be much fun (or very fast).

Honestly it'd be more interesting to discuss supporting entirely custom parsing engines and what that might look like... ie, you'd write your OWN Turtle parser/tokenizer in JS (or use an existing one!). And then we just have some minimal "glue" wrt Highlight.js so that we know content/tags/classes to split out in the final output.

Obviously every grammar having a full parser would be very heavy sizewise (not to mention all the potential bugs), which is why it would never make any real sense for core, but for specialized 3rd party grammars where you really want to highlight ONE grammar really well it might not be a big deal at all to pull in a full parser and then just pipe those tokens back into HLJS.

ericprud commented 3 years ago

I'm intrigued by driving this from a grammar. I've just templatized jison and jison-lex to be typescript-friendly (ts-jison, ts-jison-lex), which means we could substitute whatever template made it work well with highlight.js. We could look up the vstack to see when we're in some production and the lstack to synchronize locations.

(If you're a jison/bison hacker, you've seen the vstack as $1..$n in your semantic actions. Because the ll or lr parser is reducing the n last terms on the stack, $1..$n give you access to them by lexically translating those to indexes on the vstack (AKA $$). If you over-run that index, you can see not just what you're supposed to be reducing, but how you got there.)

joshgoebel commented 3 years ago

I'm not familiar with those projects.

Here is sort of what I very roughly imagined (just OTTOMH):

We add a customParse key to the language definition which is a function fn(code, emitter)
The highlight function looks for this key and if present calls it passing it __emitter (see code for TokenTreeEmitter)
The highlight function does no actual parsing itself (ie, our built-in parsing engine is never consulted)
You customParser code would do all parsing itself using the Emitter API to communication with Highlight.js about tokens, text, etc.

The end of the highlight process would be no different than usual, cleaning up the emitter.

      emitter.closeAllNodes();
      emitter.finalize();
      result = emitter.toHTML();

The emitter API is documented in code (token_tree.js). You'd likely only be using addKeyword, addText, and open/closeNode.

This is technically all private API but if we decided to open this up we'd likely need to make it public.

To implement this you'd just add an if/else block inside _highlight to dispatch to customParse - and of course have to implement the customParse function. If you're using some sort of generator then I imagine you need the JS that produces plus a thin layer of glue to make that interface with our Emitter, which is all the customParse function would be. In a perfect world using an existing/generated parser the glue code might be only ~50-100 few lines. (you'll need to map token types to HLJS classes, etc)

joshgoebel commented 2 years ago

Any progress or should this repo be archived or removed? Doing a bit of spring cleaning.

VladimirAlexiev commented 2 years ago

hi @joshgoebel and @ericprud! I'll make a PR with turtle, sparql, pie (and maybe shaclc). And I'll try to get a colleague to fix details (eg tests).

VladimirAlexiev commented 2 years ago

@cygri @ericprud @joshgoebel looking for volunteers to:

copy more stuff from https://github.com/highlightjs/highlightjs-shexc, in particular how to use it from JS. It includes 3 languages: turtle, sparql, pie (and I hope to write a highlighter for shaclc)
add some tests (I have placed 4 test files, but I’d like to see them highlighted on a webpage)
Maybe add logos of these 3+1 languages (and subsections for each)
Maybe think about using https://github.com/highlightjs/highlight.js/blob/main/docs/language-guide.rst#sub-modes (there’s a discussion of this feature here, and it will put the common definitions of terminals like PCHARS, IRI, etc only once)
Try it on the test files
Put your name as contributor
Create a PR for SUPPORTED_LANGUAGES.md as per https://github.com/highlightjs/highlight.js/blob/main/docs/language-contribution.rst

highlightjs / highlightjs-turtle

Migrate code #2