torchlight-api / remark-torchlight

A remark plugin for Torchlight - the syntax highlighting API.
13 stars 2 forks source link

Error when trying to import remark-torchlight #1

Open Benjaminsson opened 3 years ago

Benjaminsson commented 3 years ago

I get this error when trying to import remark-torchlight:

SyntaxError: Cannot use import statement outside a module

I reproduced the error in this CodeSandbox

cody-quinn commented 3 years ago

:+1: Am getting the exact same issue

import remarkTorchlight from "remark-torchlight";
error - /home/~~Redacted~~/node_modules/remark-torchlight/index.js:1
import parse5 from 'parse5'
^^^^^^

SyntaxError: Cannot use import statement outside a module
ghost commented 2 years ago

Same issue here

jobyh commented 2 years ago

Also preventing me using in NextJS (swc compiler) 🙁 @aarondfrancis is there something we can help with toward getting this resolved?

aarondfrancis commented 2 years ago

Ok yall, thanks for your patience. I just updated the underlying library and this library to ESM, but now the test is failing. It worked with the old node_modules versions I had installed locally, but when I pushed to GH it failed. I've spent a couple hours digging and trying to figure out what changed, but the whole remark/rehype/unified/hast ecosystem stuff is not super straightforward to me.

Can anyone tell if something has changed and I need to update the shape of the nodes coming out of this plugin?

aarondfrancis commented 2 years ago

Ok I finally figured it out. Between remark-html 14.0.0 and 14.0.1 they changed a default param.

https://github.com/torchlight-api/remark-torchlight/actions/runs/2506264156

The test now works, but I don't feel great that it relies on sanitize false.

Should this be a rehype plugin instead?

jobyh commented 2 years ago

Amazing @aarondfrancis thank you! Will have a go with this later on.

RE: sanitize: false I agree it feels incorrect, particularly after reading the remark docs.

RE: Rehype - in short, yes, I think you could be right. While it wasn't possible to use this library yesterday I ended up doing some digging of my own and integrating rehype-highlight for SSR and started wondering the same thing.

The full pipeline felt like it made the most sense as a chain via unified. A torchlight-rehype plugin could replace the use of rehype-highlight below:

import {readFileSync} from 'fs'
import {unified} from 'unified'
import highlight from 'rehype-highlight'
import parse from 'remark-parse'
import rehype from 'remark-rehype'
import stringify from 'rehype-stringify'
import matter from 'gray-matter'

 const markdown = matter(readFileSync(`./path/to/your/markdown.md`), 'utf8'))
 const rendered = (await unified()
    .use(parse)
    .use(rehype)
    .use(highlight)
    .use(stringify)
    .process(markdown.content))
    .toString()

the whole remark/rehype/unified/hast ecosystem stuff is not super straightforward to me

...yep 🤯 - my takeaway was: remark for raw markdown content to HTML and rehype for processing HTML. They're all built on top of unified.

mnapoli commented 1 year ago

@aarondfrancis it seems the last release that fixes this (0.0.3) is not published to NPM: https://www.npmjs.com/package/remark-torchlight?activeTab=versions

Any way to publish it to NPM? Thanks!

mcgrealife commented 1 year ago

v0.0.3 might be unpublished because it requires a workaround that removes remark-html's sanitize option (creating an XSS attack vulnerabillity).

As a temporary solution, downloading v0.0.3 and providing it to package.json as a local module (i.e. in package.json reference the package via the syntax file:path/to/local-module) works

Stale research: For a long-term solution: - I'm new to remark/rehype too, but it seems like @jobyh is on the right track. Currently, the torchlight remark dependency of [remark-html](https://github.com/remarkjs/remark-html) is just a "shortcut for .use(remarkRehype).use(rehypeStringify)" in a unified chain. - so building off @aarondfrancis idea, maybe the solution is to use rehype directly, where we have control over rehype-santize - specifically, it looks like rehype-sanitize needs a schema with an allowlist of torchlight class names https://github.com/rehypejs/rehype-sanitize#unifieduserehypesanitize-schema - so the new use chain might be ``` await unified() // unified directly .use(remarkRehype) .use(rehypeTorchlight) .use(rehypeSanitize, {schema}) // schema with allowed class names .use(rehypeStringify) .process(markdown) ``` (instead of the current v0.0.3 way): ``` await remark() .use(remarkhtml, {santize: false}) .use(remarkTorchlight) .process(markdown) ```
mcgrealife commented 1 year ago
Stale research Update: I have some working versions, and details about sanitization. Conclusion: remark-torchlight does _not_ need to be re-written as a rehype plugin. re**mark** = markdown re**hype** = hypertext (HTML) Currently, `remark-torchlight` works while the code is still in a markdown abstract syntax tree (**m**ast), before being converted (by remarkRehype) into an HTML abstract syntax tree (**h**ast). ### Plugin chain Both of chain configurations work. The important new plugin is .use(rehypeSanitize): _Edit: there is a 3rd simpler option that allows passing the schema object to the sanitize prop in remark-html (see final comment below)_
await unified() **await Unified()**: > the benefit of awaiting `unified()` (instead of `remark()`) seems to be that your input can begin as something other than markdown. The drawback is that it requires more .use() plugins. ``` import { unified } from 'unified' import remarkParse from 'remark-parse' import torchlight from '../modules/remark-torchlight' import remarkRehype from 'remark-rehype' import rehypeSanitize, { defaultSchema } from 'rehype-sanitize' import rehypeStringify from 'rehype-stringify' await unified() .use(remarkParse) // required when using unified .use(torchlight, config) // _before_ markdown is transformed into HTML .use(remarkRehype) // converts from mast to hast (markdown syntax tree to HTML syntax tree) .use(rehypeSanitize, schemaObject) // sanitize will remove all attributes that are not explicitly specified in the schemaObject (i.e. class names, styles) .use(rehypeStringify) .process(markdownInput) ```
await remark() > Closer to the current document. Under the hood, remark() is awaiting unified() and using remarkParse() ``` import { remark } from 'remark' import torchlight from '../modules/remark-torchlight' import remarkRehype from 'remark-rehype' import rehypeSanitize, { defaultSchema } from 'rehype-sanitize' import rehypeStringify from 'rehype-stringify' await remark() // if we know we're starting with markdown, we do not need unified and can start directly with remark(). This option does not require using remarkParse .use(torchlight, config) // highlight code while it's still markdown .use(remarkRehype) // convert to HTML .use(rehypeStringify) .process(markdownInput) ```
### Schema challenges of `use.(rehypeSanitize, schema)` If an attribute is not explicitly provided, it will be removed. ``` import rehypeSanitize, { defaultSchema } from 'rehype-sanitize' // ... .use(rehypeSanitize, { ...defaultSchema, attributes: { ...defaultSchema.attributes, pre: [ ...(defaultSchema.attributes.pre || []), ['className', 'torchlight', 'has-highlight-lines', 'has-focus-lines'], ['style', 'background-color: #2e3440ff'], ], code: [ ...(defaultSchema.attributes.code || []), ['className', 'language-js', 'js'] ], div: [ ...(defaultSchema.attributes.div || []), ['className', 'line', 'line-focus', 'line-highlight', 'line-focus', 'line-has-background', 'yourCustomClass'], ['style', 'background-color: #3b4252'], ['id', 'customId'], ], span: [ ...(defaultSchema.attributes.span || []), ['className', 'line-number'], ['style', 'color: #D8DEE9;', 'color: #88C0D0;', 'color: #A3BE8C;', 'color: #88C0D0;', 'color: #D8DEE9FF;', 'color: #ECEFF4;'], // requires semicolons? ] } }) ``` #### Some schema challenges: 1. In the example, I added some classNames manually, but torchlight supports more. Maybe torchlight exports a map of HTML elements and possible classNames. 2. torchlight themes are implemented as `color` and `background-colors` properties on the`style` attribute, with _much_ variance between themes. I can't find any theme declaration files that could be used as a dictionary yet. 3. torchlight allows users to add [custom classes and ids](https://torchlight.dev/docs/annotations/classes), which would need to be passed to the schema a. note: rehypeSantize will append `user-content-` to your custom id. E.g. from `id="yourFakeId"` to `id="user-content-yourFakeId"`
Other small observations - I noticed that the remark-torchlight code imports "pase5" and "parseFrom5" to convert the plugin input from a markdown abstract syntax tree (mast) to an HTML abstract syntax tree (hast) - the output of remarkRehype is already a hast. So maybe an opportunity for simplification. (it would require moving the .use(torchlight) down the chain, so its not used until after .use(remarkRehype))
mcgrealife commented 1 year ago
The `remark-html` plugin allows passing a **schema object into the sanitize prop**. E.g. ``` await remark() .use(html, { sanitize: { torchlightSchemaObject } // schema object here! }) .use(torchlight, config) .process(markdownInput) ```

It works like an 'allowlist' for HTML attributes and their values. I.e. classNames and styles. If a class or style is not explicitly provided, it will be removed.

Example extended schema #### There are *numerous* quirks with this. But it allows most of the default torchlight theme. ``` import { defaultSchema } from 'rehype-sanitize' // or from 'hast-util-sanitize' const extendedSchema = { ...defaultSchema, attributes: { ...defaultSchema.attributes, '*': ['style', 'className'], // this `*` wildcard syntax allows these attributes on any HTML element. pre: [ ...(defaultSchema.attributes.pre || []), ['className', 'torchlight', 'has-highlight-lines', 'has-focus-lines', 'torchlight has-highlight-lines has-focus-lines'], // i.e. "allow these classes on the 'pre' element ['style'] // ['style', 'background-color: #2e3440ff; --theme-selection-background: #88c0d099;'], // this has to be uncommented or a full string like this ], code: [ ...(defaultSchema.attributes.code || []), ['className', 'language-js', 'js', 'has-focus-lines', 'torchlight', 'has-highlight-lines', 'has-focus-lines'] ], div: [ ...(defaultSchema.attributes.div || []), ['className', 'line', 'line-focus', 'line-highlight', 'line-focus', 'line-has-background', 'yourCustomClass'], ['style', 'background-color: #3b4252', 'background - color: #3b4252; '], ['id', 'customId'], ], span: [ ...(defaultSchema.attributes.span || []), ['className', 'line-number'], ['style', 'color: #D8DEE9;', 'color: #88C0D0;', 'color: #A3BE8C;', 'color: #88C0D0;', 'color: #D8DEE9FF;', 'color: #ECEFF4;', 'color: #d8dee9;', 'color:#d8dee9; text-align: right; -webkit-user-select: none; user-select: none;', 'color:#4c566a; text-align: right; -webkit-user-select: none; user-select: none;'], // long strings required for line number style ] } } ```

To create a robust torchlight schema likely requires exporting some dictionaries from the core torchlight library (or from it's underlying shiki processor). For example, a map of all supported language keys.