scrivo / highlight.php

A port of highlight.js by Ivan Sagalaev to PHP
BSD 3-Clause "New" or "Revised" License
695 stars 45 forks source link

Allow HTML Tags in Code Snippet #62

Closed taufik-nurrohman closed 4 years ago

taufik-nurrohman commented 4 years ago

The reason I use highlight.js in my blog was because it allows me to preserve HTML tags inside the code snippet. Knowing that there is a highlight.php project makes everything so perfect! But one thing is missing; that is the ability to retain HTML tags inside the code snippet. I used to add <mark> tags to explain to my readers about the important parts, and maybe some <a> tags to link some piece of code to the documentation page.

How do I keep the HTML markup in my code snippet? Disabling the “safe mode” does not work for me. It always throws “`self` is not supported at the top-level of a language.” message anyway.

By the way, I have used your project to complete my content management system extension here. This project fits my environment so well :+1:

Thanks.

allejo commented 4 years ago

Could you provide a snippet with your HTML tags that works in highlight.js but not highlight.php? And if you could provide a snippet that breaks with safe mode disabled too, that'd be great!

If one snippet meets both requirements, then that's fine too! This way I can debug where the differences in behavior are.

By the way, I have used your project to complete my content management system extension here. This project fits my environment so well đź‘Ť

This makes me very happy to hear! :smile:

taufik-nurrohman commented 4 years ago

I’m on mobile now, sorry. With highlight.js, it is possible to wrap custom elements across token:

IMG_20200125_175807

http://jsfiddle.net/toubia95/bDW8H

With highlight.php, it gets encoded:

IMG_20200125_175744

https://mecha-cms.com/store/extension/t-o-c

Use element inspector to get the generated markup.

allejo commented 4 years ago

Executing highlight() in both highlight.js and highlight.php (without using something like <mark> return the same result:

Code to highlight

// fonction pour créer un objet date
function gjsCreerObjetDate(jour, mois, annee) {
    return new Date(annee, mois - 1, jour);
}

highlight.php

<span class="hljs-comment">// fonction pour créer un objet date</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">gjsCreerObjetDate</span>(<span class="hljs-params">jour, mois, annee</span>) </span>{
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(annee, mois - <span class="hljs-number">1</span>, jour);
}

highlight.js

<span class="hljs-comment">// fonction pour créer un objet date</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">gjsCreerObjetDate</span>(<span class="hljs-params">jour, mois, annee</span>) </span>{
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>(annee, mois - <span class="hljs-number">1</span>, jour);
}

In your JSFiddle, you are using highlightBlock, which does not have an equivalent in highlight.php. highlightBlock has special behavior to handle the DOM and leave HTML alone; it's why you must escape HTML if you want highlight.js to highlight HMTL.

In the following example, I'll use <mark> to see the different behaviors.

highlight.php with <mark>

Even though it doesn't match highlight.js behavior, highlight() in highlight.php is working as intended here, in my opinion.

$hl = new Highlighter();
$text = <<<CODE
// fonction pour créer un objet date
function gjsCreer<mark>ObjetDate(jo</mark>ur, mois, annee) {
    return new Date(annee, mois - 1, jour);
}
CODE;

echo $hl->highlight('javascript', $text)->value;
<span class="hljs-comment">// fonction pour créer un objet date</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">gjsCreer</span>&lt;<span class="hljs-title">mark</span>&gt;<span class="hljs-title">ObjetDate</span>(<span class="hljs-params">jo&lt;<span class="hljs-regexp">/mark&gt;ur, mois, annee) {
    return new Date(annee, mois - 1, jour);
}</span></span></span>

highlight.js with <mark>

const code = `// fonction pour créer un objet date
function gjsCreer<mark>ObjetDate(jo</mark>ur, mois, annee) {
    return new Date(annee, mois - 1, jour);
}`;
console.log(hljs.highlight('javascript', code).value);
// fonction pour créer un objet date
function gjsCreer&lt;mark&gt;ObjetDate(jo&lt;/mark&gt;ur, mois, annee) {
    return new Date(annee, mois - 1, jour);
}

With all of this in mind, let me think a bit about whether or not to support this behavior and add a highlightBlock() equivalent in highlight.php.

In my opinion, embedding HTML in a code snippet to bring emphasis to something isn't the way to do things. I personally would make a preprocessor to take special markup, remove it, highlight it, and have a postprocessor reinsert the markup. But that is beyond the scope of this project.

Using splitCodeIntoArray() as an example, the way it works is we highlight everything, then split it up, and then render it in a table or div. Take a look at https://github.com/spatie/commonmark-highlighter/issues/9 to see how highlighting lines can be implemented.

taufik-nurrohman commented 4 years ago

Maybe this behaviour can be resolved by detecting whether the input is already escaped or not to tell the parser to escape input before proccessing it or not (or simply add an option to disable auto-escape the input). But as this project really depends on the highlight.js specification then I can’t do anything.

Most of code snippet highlighted with highlight.js must be escaped because without escaping it, we will end up with broken HTML markup whenever JavaScript is disabled, if we have some HTML snippet to be highlighted.

taufik-nurrohman commented 4 years ago

With all of this in mind, let me think a bit about whether or not to support this behavior and add a highlightBlock() equivalent in highlight.php.

Would be good if we can leave the DOMDocument stuff from this.

Here are some specs:

Case 1

Before

abcd <a>"efgh"</a> ijk

After

abcd <a><span class="hljs-string">"efgh"</span></a> ijk

or:

abcd <span class="hljs-string"><a>"efgh"</a></span> ijk

Case 2

Before

ab<a>cd "ef</a>gh" ijk

After

ab<a>cd </a><span class="hljs-string"><a>"ef</a>gh"</span> ijk

or:

ab<a>cd <span class="hljs-string">"ef</span></a><span class="hljs-string">gh"</span> ijk
allejo commented 4 years ago

Maybe this behaviour can be resolved by detecting whether the input is already escaped or not to tell the parser to escape input before proccessing it or not (or simply add an option to disable auto-escape the input). But as this project really depends on the highlight.js specification then I can’t do anything.

It's not that escaping is the problem, it's how to handle an entirely different language during the highlighting process. Let's take a language with generics (TypeScript) as an example:

function doSomething<T>(arg: T, accuracy: number) {}

Because highlight.js works in the browser with a DOM, you can have actual HTML elements in the mix but if you're using highlight.js via Node, you can't easily.

Let's say you want to highlight "doSomething" in this example. You have two options:

Without escaping

function <mark>doSomething</mark><T>(arg: T, accuracy: number) {}

With escaping

function &lt;mark&gt;doSomething&lt;/mark&gt;<T>(arg: T, accuracy: number) {}

Regardless of whether or not you escape the incoming text, that won't change the fact that our highlighter will not know how to handle mark when told to highlight this code snippet as typescript.

highlight() both in JS and PHP only accept strings and don't have an understanding of DOM nodes. My understanding is when using something like highlightBlock, DOM nodes are taken into account.

This is why I feel a pre+post-processor would be needed so whenever highlight() is called, it's only highlighting valid TypeScript code (like in the example above). The pre-processor would remove the custom HTML and the post-processor would reinsert it. This is why if anything, it'd belong as a utility because this isn't "core highlight.js behavior" and it'd introduce a performance hit since we're now parsing a DOM. So just like splitting into lines, this performance hit would need to be "opt-in" for users.

taufik-nurrohman commented 4 years ago

Just got an idea. For the preprocessor, I can store the HTML tag match offset using preg_split() and PREG_SPLIT_OFFSET_CAPTURE flag. In the process, I can remove the HTML tags, after process, I can then restore back the splitted HTML tags based on its captured offset data using certain method.

$tokens = preg_split('/(<[^>]+>)/', $input, null, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_OFFSET_CAPTURE);
$out = $hl->highlight('html', strip_tags($input));

$out->value = post_process($out->value, $tokens);
// Etc.
allejo commented 4 years ago

Sure that could possibly work. Just make sure you take into consideration that your offsets will be different from the original source code since highlight() will return all the extra span tags.

allejo commented 4 years ago

After thinking about this further, I'm unable to come up with a solution that would work robustly across multiple languages and scenarios. Right now, both highlight.js and highlight.php take in unescaped code (e.g. HTML, XML, languages with generics) and will auto-escape them during the highlighting process and return valid HTML. I feel like introducing a function like highlightBlock to highlight.php would introduce confusion on how to give code as a parameter (escaped or unescaped), what markup would allow for highlighting lines (e.g. should it be XML?), how to add links inside of code, etc. It's unreliable to parse HTML/XML with regular expressiones, so while it may work in your use case, it won't guarantee it'll work in others.

I'm open to revisiting this topic if someone has a robust way of handling this problem and including it in the HighlightUtilities namespace. Until then, I'll close this issue as "won't fix" and leave my suggestion as: implement this functionality in your projects on top of highlight.php with a pre+post-processor.