stackernews / stacker.news

Internet communities that pay you Bitcoin
https://stacker.news
MIT License
403 stars 105 forks source link

Added support for <sub> <sup> in markdown #1215

Closed kravhen closed 3 days ago

kravhen commented 3 weeks ago

Description

Add support for <sub> and <sup> as HTML tags in Markdown posts. Uses remark-rehype, rehype-raw to allow HTML tags and rehype-sanitize to ensure the default schema is expanded only for select specific tags (in this case, subscript and superscript). Works in user bio. Does not work in post titles; tags in title remain as normal strings.

closes #253

Screenshots

image image image

Additional Context

I considered other solutions including remark-subersub, which is supposed to be a simple drop-in-and-use option, but it hasn't been updated in years and even when I tried it, only subscript worked because their syntax for superscript was ~tildes~ which gets overridden as strikethrough instead.

Went with rehype-raw to simply allow the use of HTML tags in markdown. It's suggested to be used along with rehype-sanitize to avoid XSS risk by enforcing a sanitization schema and extending allowed HTML tags only to what is specifically chosen (in our case, sub and sup only). It seems to be more extensible if other HTML tags end up needing to be used in the future.

Another reason I went with a rehype solution is because even if it requires new packages (raw and sanitize), rehype was already being used in the project.

Checklist

Are your changes backwards compatible? Please answer below: Yes. Existing posts that were using <sub> and <sup> in their posts should see text properly displayed as sub- and superscript.

Did you QA this? Could we deploy this straight to production? Please answer below: Works as described.

For frontend changes: Tested on mobile? Please answer below: Works as expected.

Did you introduce any new environment variables? If so, call them out explicitly here: No new env variables.

socket-security[bot] commented 3 weeks ago

No dependency changes detected. Learn more about Socket for GitHub ↗︎

👍 No dependency changes detected in pull request

kravhen commented 3 weeks ago

I'm not too familiar with the failing Socket Security alert. The new imports should not have installed puppeteer as a dependency as far as I know, if that is the issue.

Or is it a more generic alert due to package.json additions?

ekzyis commented 3 weeks ago

I'm not too familiar with the failing Socket Security alert. The new imports should not have installed puppeteer as a dependency as far as I know, if that is the issue.

Or is it a more generic alert due to package.json additions?

Yeah it seems to behave weird when there are changes to package.json. I think we can ignore it.

I'll try to give this PR a full review today or tomorrow

kravhen commented 3 weeks ago

I can take some time to look for a less spooky alternative or into ways to restrict even the default schema somehow if required. If you're uneasy, I'm uneasy, and if we're in no rush to merge this in, I've already spent some time searching for a solution and I welcome the educational value of pursuing further.

p.s. ~I think I goofed the suggested change up there. Ill do it now as it was suggested and commit that for now.~ All good, commit did go through.

huumn commented 2 weeks ago

Regardless of what happens with this PR we will award you with completing the issue, but I don't think we want to begin supporting html this way. It's mostly a big thing for us because we store attached wallet creds in the browser.

Our preferred approach would be to create a dedicated plugin for <sup> and <sub> tags that works like other markdown markup rather than allowing html. If that's not possible I don't think we'll do this until we understand those packages well.

kravhen commented 2 weeks ago

Understood. It makes sense not to bring in new packages for something so minor, especially if it's not in the plans to start supporting more HTML tags in the future. I was not sure whether that was in the cards, but since it isn't, it definitely tips the scales heavily towards an alternative method of enabling sub and sup.

If this PR can remain open I can look into the other option. I'll clean up and revert the changes that bring in the new rehype packages and see what can be done on a case-by-case basis with a plugin like the one for inline code.

huumn commented 2 weeks ago

Yep we'll keep it open!

kravhen commented 1 week ago

Update: I've got something to work by parsing the text for markdown-like syntax of .e.g ^...^ for sup and _..._ for sub. The tags get replaced by <span> with a utility class to style with {vertical-align: sup; font-size: smaller;}

The problem is I can't seem to get it to work with html-like tags like <sub>...</sub>. Tried a few different ways to tweak the main plugin function but the tags just didn't get parsed at all. In some cases, I would switch the target string of characters from the html-like ones to single-char based as described above and it would just work right away.

Parsing for ^ and <sub> image image

After digging a bit and trying things in a sandbox on the side I noticed the nodes with tags would have type 'html' whereas in the SN project, the tags are of type raw. Regardless if that is related or not, even if I get visit() to parse raw nodes, they still don't work like the single characters do.

I can post how far I got in a following comment or commit, or what the main function looks like at the moment, if anyone was curious or knows more about the inner workings and catches something I don't.

kravhen commented 1 week ago

I've pushed a commit to show what the plugin code looks like at the moment. As described above, it does not work with html-like tags, but works with single characters.

Open to suggestions and comments if someone sees something I don't to get it working with the tags as required. I'll have to read up and play with visit() and nodes a bit more. Should have some time for that during the weekend, fiat mines got me cooked until then.

ekzyis commented 1 week ago

I think the issue that you're having is that you need to manipulate the tree on the element level. The markdown parser seems to always include html tags as their individual nodes. So node.value will never contain <sup>foo</sup> since it will always be split into three nodes: <sup>, foo and </sup>.

Therefore, I think you need to do something like this that works on the element level and walk the tree like this:

function rehypeStyler (startTag, endTag, className) {
  return function (tree) {
    visit(tree, 'element', (node) => {
      for (let i = 0; i < node.children.length; i += 1) {
        const start = node.children[i]
        const text = node.children[i + 1]
        const end = node.children[i + 2]

        // is this a children slice wrapped with the tags we're looking for?
        const isWrapped =
          start?.type === 'raw' && start?.value === startTag &&
          text?.type === 'text' &&
          end?.type === 'raw' && end?.value === endTag
        if (!isWrapped) continue

        const newChildren = {
          type: 'element',
          tagName: 'span',
          properties: { className: [className] },
          children: [{ type: 'text', value: text.value }]
        }
        node.children.splice(i, 3, newChildren)
      }
    })
  }
}

This seems to do what we want during my shallow testing but maybe I overlooked something.

kravhen commented 1 week ago

Your version works as expected when I tried it too. Didn't find any goofy behavior on my end even when doing writing weird things in a post like nesting them.

I knew we were close!

I'll push a commit using it.

kravhen commented 5 days ago

Nice work! Once the changes package-lock.json are reverted, this looks good to merge.

Huh. Thought I had done that. I'll put in another commit shortly.

kravhen commented 5 days ago

I'm not sure if this was the right way to do it, but I reverted the package-lock (via checkout -- package-lock to its state in the master branch. Then ran npm install again to regenerate it.