highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
BSD 3-Clause "New" or "Revised" License
23.3k stars 3.52k forks source link

[Request] Providing information about the passed raw text's offsets/color for the highlight #4035

Closed mozesstumpf closed 6 days ago

mozesstumpf commented 2 months ago

As far as I know, the current hljs methods (highlight, highlightAuto, etc. etc.) only returns a value that contains the HTML string with the highlighted markup.

The lately implemented Highlight API makes it possible to change the color, background of the text without modifying the DOM itself.

With this API, I think it would be useful if the hljs methods would provide information about the passed text's position and color, therefore we could manage the highlight easily with the Highlight API.


const code = "Code";
 // const
 { startOffset: 0, endOffset: 5, color: "blue" },
 // whitespace
 { startOffset: 5, endOffset: 6, color: "transparent" },
 // code (variable name)
 { startOffset: 6, endOffset: 10, color: "white" },
 // white space
 { startOffset: 10, endOffset: 11, color: "transparent" },
 // equal sign (=)
 { startOffset: 11, endOffset: 12, color: "white" },
 // white space
 { startOffset: 12, endOffset: 13, color: "transparent" },
 // "Code"
 { startOffset: 13, endOffset: 19, color: "orange" },
 // semicolon (;)
 { startOffset: 19, endOffset: 20, color: "white" },
joshgoebel commented 2 months ago

We don't generate a sequential list of ranges - it's possible to have deeply nested scopes (a tree). Now if you wanted to walk that tree (and flatten it) to try and build something like this you may want to have a look at https://github.com/wooorm/lowlight instead which users highlight.js but provides guarantees about the output format. Or build the same on us using the same private (but stable) __emitter API that lowlight is using.

Right now though I don't think this CSS API looks very interesting from our point of view - seems to only support single color highlighting... so I can't see why we'd do anything special in the core library to support this.

mozesstumpf commented 2 months ago

Thanks for the suggestions, I'll look into it.

Regarding to the Highlight API, I made a little demo how it would work with multiple colors.


joshgoebel commented 2 months ago

How does the API handle nested ranges?

mozesstumpf commented 2 months ago

What do you mean on "nested ranges"? Could you show me an example?

joshgoebel commented 2 months ago

"blah #{variable}"

That is variable, within a subst (what we call interpolation), within a string - perhaps even within a "string container" (where the "" are inside the string container but not part of the string itself.

And you can't just flatten there because SOME of the styling of the top-level elements might need to be inherited by the children.

mozesstumpf commented 2 months ago

If I understood correctly, your question is whether is it possible to achieve the same style with Ranges that you can manage with the Element nesting or not.

The Highlights (ranges) can be nested so you can accomplish the same result just like when wrapping texts into elements.


joshgoebel commented 2 months ago

No I mean something more like a SINGLE string with multiple ranges:

const parentRange = new Range();
parentRange.setStart(parent, 2);
parentRange.setEnd(parent, 14);

const childRange = new Range();
childRange.setStart(parent, 3);
childRange.setEnd(parent, 8);

So here the child is 3-8 while the parent is 2-14, ie the child SHOULD be inside/nested in the parent. I don't understand why you need more elements at all... if you already have elements everywhere it'd easy enough to just use CSS - and avoid the highlighting API entirely.

Feel like I'm still missing something here.

mozesstumpf commented 2 months ago

This example was about to show that the ranges can be nested. I don't need to use any element if I could use the Highlight API.

const parentRange = new Range();
parentRange.setStart(parent, 2);
parentRange.setEnd(parent, 14);

const childRange = new Range();
childRange.setStart(parent, 3);
childRange.setEnd(parent, 8);

I'm not sure what you meant by the example above because the ranges are invalid if it's related to the demo.

How does the API handle nested ranges?

Regarding your question, the ranges can be nested in a single string.

I don't understand why you need more elements at all... if you already have elements everywhere it'd easy enough to just use CSS - and avoid the highlighting API entirely.

I'm currently working on a rich text editor where the user can modify the text in the code block, and the code block can strictly contain only a single text node. Code block example:

  function example() {}
joshgoebel commented 2 months ago

I'm not sure what you meant by the example above because the ranges are invalid if it's related to the demo.

I'm simpying imagining them as indexing into the string...

<------                        full string               ---------->
  "    #{blah_variable_here}. "
  < 2 thru                  18>
       <5 thru           14>

(not to any kind of scale, lol)