highlightjs / highlight.js

JavaScript syntax highlighter with language auto-detection and zero dependencies.
https://highlightjs.org/
BSD 3-Clause "New" or "Revised" License
23.3k stars 3.52k forks source link

(JavaScript/TypeScript) Regular expression with quotes inside template literal breaks highlighting #4066

Open JoaoFelipe3 opened 1 week ago

JoaoFelipe3 commented 1 week ago

If a template literal with a regular expression that contains an odd number of quotes is highlighted, the last quote is treated as beginning a string and extends until closing it, at which point you still need to close the placeholder and the template literal itself.

I am testing with the javascript, but the same goes for the typescript grammar.

I am not using the language detection.

Here is some example code to test this on:

`${/"/}`
doStuff()
// "}`

I expected the quote to not open a string at all, and just be treated as part of the regular expression.

Here's the code this was found in (TypeScript):

const or = (fns: readonly (() => string)[]) => {
  for (const fn of fns) {
    try {
      return fn();
    } catch {
      continue;
    }
  }
  throw new Error("or failed");
};
function* many(fn: () => string) {
  while (code.length) {
    try {
      yield fn();
    } catch (_) {
      break;
    }
  }
}
const optional = (chr: string) => t(new RegExp(`^\\${chr}?`)) || chr;
const t = (re: RegExp) => {
  code = code.slice(code.search(/[^ ]/));
  const match = code.match(re)![0]; // throws if no match
  code = code.slice(match.length);
  return match;
};

const chain = () => or([triad, dyad, monad, infix]);
const arr = () =>
  t(/^\[/) + [...many(() => expr() + optional(","))].join("") + optional("]");
const num = () => `new $N(${Number(t(/^[+-]?(?:\d*\.\d+|\d+)/))})`;
const str = () => `new $S(${t(/^"(?:\\.|[^"])*"/)})`;
const chr = () => `new $S('${t(/^'./)[1]}')`;
const nme = () => t(/^[A-Z]/);
const arr = () =>
  t(/^\[/) + [...many(() => expr() + optional(","))].join("") + optional("]");
const nilad = () => or([arr, num, str, chr, nme]);
const triad = () => `.${t(/^[a-z][\u0324:]/)[0]}(${arg()},${arg()})`;
const dyad = () => `.${t(/^[a-z][\u0323\.]/)[0]}(${arg()})`;
const monad = () => `.${t(/^[a-z]/)}()`;
const infix = () => t(/^[+\-*/]/) + arg();
const chain = () => or([triad, dyad, monad, infix]);
function arg(): string {
  const z = () => or([nilad, () => "X" + chain()]);
  const p = () => t(/^\(/) + expr() + optional(")");
  return or([p, z]);
}
function expr() {
  return arg() + [...many(chain)].join("");
}
joshgoebel commented 1 week ago

The existing behavior is intentional. https://github.com/highlightjs/highlight.js/issues/3288

Regex is really hard to deal with without a full context-aware parser. I'll leave this open a bit to see if anyone wants to try and tackle this - and solve BOTH issues... but eventually if no one shows up we'll auto-close it a cant-fix.