wooorm / refractor

Lightweight, robust, elegant virtual syntax highlighting using Prism
MIT License
722 stars 33 forks source link

Regex delimiters from `js` not included in `jsx`? #34

Closed karlhorky closed 3 years ago

karlhorky commented 4 years ago

Hi @wooorm , sorry for all the issues lately!

Working on link and highlighting tooling a bit right now...

Here's something that I found interesting: the js language returns some regex delimiter tokens, but the jsx language does not... should it be like that? I thought those extra tokens came from js-extras, maybe this is not included with jsx or typescript or tsx?

Ran the following on Runkit:

var refractor = require("refractor")
console.log(JSON.stringify(refractor.highlight('/\bno\-highlight\b/', 'js'), null, 2))
console.log(JSON.stringify(refractor.highlight('/\bno\-highlight\b/', 'jsx'), null, 2))

...and received the following:

JS

[
  {
    "type": "element",
    "tagName": "span",
    "properties": {
      "className": [
        "token",
        "regex"
      ]
    },
    "children": [
      {
        "type": "element",
        "tagName": "span",
        "properties": {
          "className": [
            "token",
            "regex-delimiter"
          ]
        },
        "children": [
          {
            "type": "text",
            "value": "/"
          }
        ]
      },
      {
        "type": "element",
        "tagName": "span",
        "properties": {
          "className": [
            "token",
            "language-regex"
          ]
        },
        "children": [
          {
            "type": "text",
            "value": "\bno-highlight\b"
          }
        ]
      },
      {
        "type": "element",
        "tagName": "span",
        "properties": {
          "className": [
            "token",
            "regex-delimiter"
          ]
        },
        "children": [
          {
            "type": "text",
            "value": "/"
          }
        ]
      }
    ]
  }
]

JSX

[
  {
    "type": "element",
    "tagName": "span",
    "properties": {
      "className": [
        "token",
        "regex"
      ]
    },
    "children": [
      {
        "type": "text",
        "value": "/\bno-highlight\b/"
      }
    ]
  }
]
wooorm commented 4 years ago

Refractor is supposed to do what prism does (albeit virtual). Cab you confirm whether prism (does not) result in the same output?

karlhorky commented 4 years ago

On the test page and on a sandbox, the javascript language results in no extra tokens:

Test Page: https://prismjs.com/test.html#language=javascript Demo: https://codesandbox.io/s/prismjs-forked-0ygs8?file=/src/index.js

karlhorky commented 4 years ago

So I suppose if there's a bug here, it's that js adds the extra tokens in refractor but not in Prism.js.

wooorm commented 4 years ago

I can confirm, javascript, jsx.

regex-delimiter is added in JavaScript, due to these Prism lines. They don’t include JSX.

The normal build of refractor includes everything, so regex runs, which modifies JavaScript. But that apparently doesn’t happen in these Prism cases? 🤔

wooorm commented 4 years ago

This seems to be due to the global state of Prism:

require("prismjs")
require("prismjs/components/prism-jsx")
require("prismjs/components/prism-regex")

console.log('js', Prism.highlight('/\bno\-highlight\b/', Prism.languages.javascript, 'javascript'))
console.log('jsx:', Prism.highlight('/\bno\-highlight\b/', Prism.languages.jsx, 'jsx'))

Yields:

js: <span class=\"token regex\"><span class=\"token regex-deli…pan><span class=\"token regex-delimiter\">/</span></span>
jsx: <span class=\"token regex\">/\bno-highlight\b/</span>

But moving prism-regex to before prism-jsx makes them the same.

karlhorky commented 4 years ago

The normal build of refractor includes everything

But moving prism-regex to before prism-jsx makes them the same

Ah, so the way to address this will be to enable all regex and other add-on features like js-extras for jsx too?

Wonder if this also affects typescript and tsx...

wooorm commented 4 years ago

Unfortunately, this is really hard to solve, because it’s a problem that comes from Prism. Prism works with one whole global object, and has intricate interdependencies. We’re already doing a lot to figure out which languages depend on each other. We used to do regexes, but since this version we’re using Prism-provided information to figure out what depends on what.

Sometimes there are direct dependencies. But in this case, it’s if X is defined, do something to it, otherwise ignore it. If you load regex, it’ll mutate javascript if it exists (one of the few that always does). If you then load jsx, it uses the mutated JavaScript object. If you do it the other way around, jsx clones javascript before regex extends javascript. 🤔

I’m guessing it also affects ts(x) indeed. And I’m sure there are others.

karlhorky commented 4 years ago

since this version we’re using Prism-provided information to figure out what depends on what

Ah interesting... couldn't find the commit / PR for this.

Edit: Or do you mean this PR https://github.com/wooorm/refractor/pull/32 ? With the commit c130059 (#32) ?

karlhorky commented 4 years ago

@RunDevelopment you wouldn't have any suggestions here would you?

If I'm understanding correctly, refractor wants to be able to load all Prism languages in the correct order for their dependencies, and is currently using some Prism-provided info for this.

RunDevelopment commented 4 years ago

Oops. Seems like I messed up with the dependencies here. I'll fix it.

wooorm commented 4 years ago

Ohh, that’s great to hear! <3

wooorm commented 3 years ago

Can confirm that this is now solved in the latest refractor.

Thanks @RunDevelopment!

karlhorky commented 3 years ago

Nice, thanks to both of you!

wooorm commented 3 years ago

@RunDevelopment Hmm, I’m running into this again. JS is included here by default. Regex isn‘t.

Because optional dependencies are passed as reference (example of regex in JS), the import order matters.

For Prism bundles, y’all are solving it very smartly (dependencies.js). That makes it impossible (I think) to load a few languages first and more languages later (which is what I do here).

One solution I can think of is for the grammars to use:

'regex-source': {
  pattern: /^(\/)[\s\S]+(?=\/[a-z]*$)/,
  lookbehind: true,
  alias: 'language-regex',
  inside: 'regex'
},

...which is then resolved to Prism.languages.regex at runtime. Would something like that be acceptable for Prism?

Alternatively, I could make more bundles for refractor: 0, common languages, and all languages. Then I can leave it up to users.

RunDevelopment commented 3 years ago

That makes it impossible (I think) to load a few languages first and more languages later (which is what I do here).

No, it's possible. The trick is that loading a language again will overwrite the old version. I.e. if your bundle includes JS but not Regex, you can just load Regex and then load JS again.

This isn't a particularly nice solution but it works. Our loadLanguages function uses exactly this strategy.


You did point out one of the painful consequences of Prism's current load-order-based dependency system. For Prism v2, we plan to adopt ESM into our codebase, at which point we will most likely have to phase out the current solution for optional dependencies and implement something (hopefully) nicer.

wooorm commented 3 years ago

Ah, so I could add:

regex.optionalDependants = ['javascript', 'vala']

And then when regex loads it’ll re-initialize javascript and vala when available? 🤔 A bit more complex would be to reload coffeescript and such, which depend on javascript.

Interesting, thanks, I’ll try something along those lines!

For Prism v2, we plan to adopt ESM into our codebase, at which point we will most likely have to phase out the current solution for optional dependencies and implement something (hopefully) nicer.

Nice! Is there an ETA on when that would be ready?

RunDevelopment commented 3 years ago

Ah, so I could add:

regex.optionalDependants = ['javascript', 'vala']

Sorry, but I don't know what that is about.

And then when regex loads it’ll re-initialize javascript and vala when available? 🤔 A bit more complex would be to reload coffeescript and such, which depend on javascript.

I suggest using our getLoader API. It will handle all of that for you.

getLoader(components, languagesToLoad, loadedLanguages)
    .load(id => { /* load language with the given id */ });

That's all you have to do. getLoader will ensure that everything works.

It's a little harder for asynchronous operations. Some languages can be loaded in parallel but others cannot. load will also take care of that for you, see the Promise example.

However, it might be a problem that getLoader needs to know all possible dependencies upfront.

Is there an ETA on when that would be ready?

Unfortunately, no.

wooorm commented 3 years ago

Thanks!

Sorry, but I don't know what that is about.

I generate custom bundles of each language, so that they can be registered on their own. For example, javascript.js.

However, I was hoping with my previous comment that it would be possible for this case to work:

  1. Assume all languages except for YAML are loaded
  2. Load YAML, which knows which other languages have be reloaded (such as markdown, and many others)

Unfortunately I found that a change in certain languages requires a lot of re-registrations, and resulted in refractor taking 1.3 seconds to load. The code I used to find the things that would have to be reregistered was this pseudo code:

var list = getLoader(components, [theLanguage], allLanguages).getIds()
var reregister = list.slice(list.indexOf(theLanguage) + 1)