wooorm / refractor

Lightweight, robust, elegant virtual syntax highlighting using Prism
MIT License
724 stars 34 forks source link

Highlighting markup code (XML, HTML, etc) containing character entities throws an Error #9

Closed peterwilliams closed 6 years ago

peterwilliams commented 6 years ago

Trying to highlight the raw code string <foo>&lt;</foo> will cause an error:

TypeError: env.content.replace is not a function

Here is a sandbox example showing the error does not occur with Prism itself but does with refractor:

https://codesandbox.io/s/wnx2rlpyww

It appears the culprit is the "wrap" hook called here. Prism's language definition files expect the content attribute of the env value passed to the hook to be a string, but refractor builds and passes a HAST object instead.

I think this could be solved by patching the language files shipped with refractor to work with a HAST object if possible (and skip the hook if it can't). For this particular bug, I think this should suffice, because the hook only acts when env.type === 'entity', and I believe the content objects for those are always of the form: {type: 'text', value: value} where value is a string.

However, if other plug-ins or future language definitions use the "wrap" hook differently, they'd have to be individually updated as well, and might need to support operations on larger or higher-level structures than text nodes (which refractor probably would not be able to convert from HAST to a string and back to HAST).

I can open a pull request to patch the groovy, markup and asciidoc language files if that sounds like a reasonable solution to you?

wooorm commented 6 years ago

Hmm, weird! Could you try creating a PR with a failing tests?

peterwilliams commented 6 years ago

Thanks for taking a look so quickly! I've opened a PR that adds a failing test fixture. Let me know if you'd like any additional information or examples from me.

wooorm commented 6 years ago

Hmm. From what I gather, the asciidoc and markup languages add a plugin to “Plugin to make entity title show the real entity”. Unfortunately we can’t patch language files directly, as they’re generated from the prism source code.

That leaves what we should do about groovy, it does something else on the wrap hook.

peterwilliams commented 6 years ago

Unfortunately we can’t patch language files directly

Maybe I'm misunderstanding or not using the right terminology, but isn't that essentially what you did here?

Option 1 solves the immediate problem, moving from "entirely broken" to "mostly working" syntax highlighting for those 3 languages. I would argue that the "title" plugin for markup is more of a "nice to have" and isn't really broken if it's missing. String interpolation for groovy sounds more important, though. But at least it's a quick fix and a step forward. I might suggest adding a try/catch around the hook instead of removing it entirely, and log a debug or error message so the consumer can at least more easily track down that something is missing if they're confused by the behavior. Or maybe adding a note in the README is enough.

I don't fully understand what Option 2 would entail, but it sounds like a lot more work and that I would be less helpful 😅. Maybe it could be a future enhancement on your "roadmap" for this library (if you have such a thing for it)?

My suggestion related to Option 3 hinged on being able to edit the "lang/*.js" files directly.

If that isn't possible, the alternative I guess would be to clone the env object before passing it to the hook, and adjust clonedEnv.content to ensure it's a string if possible. Then I think you'd need to set originalEnv.content.value to the clonedEnv.content string, which the hook might have updated. It seems like there's a lot of room for error there and some corner cases to watch out for.

Are you leaning towards any particular solution?

wooorm commented 6 years ago

@peterwilliams I added a babel transform to add .value to all cases of env.content, and that also seemed to fix groovy. I didn’t run into any problems 🤷‍♂️