remarkjs / remark

markdown processor powered by plugins part of the @unifiedjs collective
https://remark.js.org
MIT License
7.7k stars 357 forks source link

Tokenizer not being given value from inside link #410

Closed jeffsee55 closed 5 years ago

jeffsee55 commented 5 years ago

Tokenizer not being given value from inside link

Sandbox here

I'm trying to provide a custom 'handlebars' parser which takes a variable and subs it out dynamically:

unified()
...
.use(handlebars, { greeting: "Hello", link: "https://google.com" })

And those values should be present anywhere in the markdown string:

So:

{{greeting}}, World! Visit us at {{link}}

Should say:

Hello, World! Visit us at https://google.com

But when used in a link:

{{greeting}}, World! Visit us [here]({{link}})

We get no interpolation (because the tokenizer isn't getting called):

Hello, World! Visit us [here]({{link}})

The 'locator' function seems to find the index of {{ but the 'tokenizer' function doesn't seem to get called inside a link. I've tried to toy around with the notInLink option but couldn't tell if it was doing anything.

Your environment

Steps to reproduce

Sandbox here

Expected behaviour

I think the tokenizer function should be called when {{ is found regardless of whether or not it's in a link.

Actual behaviour

It seems to get skipped.

wooorm commented 5 years ago

Thanks for the detailed issue!

This may be an XY problem.

Why are you writing a plugin for this? Have you thought about other cases, so could someone inject link: ')<img src="#" onload="...">' in [here]({{link}})

remark is made for Markdown. Markdown is focussing on content, and allows nodes in content, but not nodes in other values (like the link destination). Tokenizers don’t run in link destinations, because Markdown does not allow syntax in there.

jeffsee55 commented 5 years ago

Thanks for the explanation, the use-case for this plugin is just to provide a mechanism for dynamic data to hydrate static content - our data points change much more frequently than content. One of our data points happens to be a URL so we'd like to be able to treat it as such (the injection vector doesn't really matter for us, if this was just html with dynamic links we'd face the same sanitation requirements)

A tokenizer seems to fit the requirements for us and I found it confusing that I wasn't seeing the function called based on the locator's logic but I see what you're saying about the link destination being it's own entity. I'm happy for this to be closed, any recommendations would be appreciated.

wooorm commented 5 years ago

If the data changes, why not use handlebars on change, render the data with the template to markdown, and then transform the markdown to HTML? It’s a slight difference, but it’s a very different approach that would work.

The problem is that templating stuff just isn’t Markdown. It doesn’t have to be. It’s its own “syntax” (tree). And they can’t mix 🤔🤷‍♂️

wooorm commented 5 years ago

I’ll close this because it’s not an actionable issue, but feel free to reply and I’ll try to give support.

jeffsee55 commented 5 years ago

Thanks for the advice, so you're saying do the handlebar transform before sending it in to the markdown parser? I'm sort of overwhelmed by all of the unified libraries and thought this was the stage where can do performant transforms to the string before it's turned into an ast but I think my understanding is a little off.

wooorm commented 5 years ago

Sorry for the slow reply, yes! That’s what I suggest! And I totally get how big and confusing the ecosystem is at first. We’re trying to do our best, but it can also do so much so there isn’t really one way (or a couple) to use it that we can document.

unified is indeed good in performant transforms. It gets really useful when you’re plugging in multiple plugins. And even more useful when you’re dealing with multiple formats. We recently made a new guide (that’s currently hidden) with an introduction to unified.

To re-iterate the problem: the content you are dealing with is a template, not valid Markdown. That template is compiled to (hopefully valid) Markdown. It’s impossible to treat the template as valid Markdown, so you can’t use remark to represent the template.

It could be possible to have an alternative to remark/rehype/retext that works on a Handlebars syntax tree. With its own ecosystem. That could be interesting, but would be something you’d need to create (as I don’t have the bandwidth to do that)

jeffsee55 commented 5 years ago

Thanks for that, this crystalizes the point you're making:

To re-iterate the problem: the content you are dealing with is a template, not valid Markdown. That template is compiled to (hopefully valid) Markdown.

But it conflicts with my naive understanding of what custom tokenizers are for, this section of the docs makes me think this is the right place to do it:

... Sometimes, such as when introducing new syntactic entities with a certain precedence, interfacing with the parser is necessary.

To me, my example seems very similar to the "mentions" example provided. All I want is to provide a tokenizer that will run before the 'link' tokenizer and transform it into valid markdown. And in fact, it's not clear to me how the function knows not to be run if I've specified it to run before the link tokenizer, which is why I thought it was a bug.

The mentions example in the docs won't work inside a link either, but I would say that it seems like it should, if we use Github's mention functionality as an example:

@jeffsee55 

Translates into:

[@jeffsee](https://github.com/jeffsee55)

Right? But then if we want to drop it into a link:

[Look at my profile](@jeffsee55)

It should become:

[Look at my profile](https://github.com/jeffsee55)

This works as I would expect, here's the result: Look at my profile.

So I think my question is this: If someone wanted to support mentions inside a link (like Github markdown does) would you recommend this same advice?:

It could be possible to have an alternative to remark/rehype/retext that works on a Handlebars syntax tree. With its own ecosystem.

I've updated the code sandbox with the mentions example FYI. Thanks for your time!

wooorm commented 5 years ago

This works as I would expect, here's the result: Look at my profile.

This does not work. It doesn’t link to your profile. It’s an absolute link to @jeffsee55. Similar to how a link is absolute if you’d type it without the @: jeffsee55. Because we’re on /remarkjs/remark/issues/410, the absolute link goes to /remarkjs/remark/issues/@jeffsee55. Which apparently works on GitHub (TIL): it shows issues that you authored. It does not go to your profile.

So I think my question is this: If someone wanted to support mentions inside a link (like Github markdown does) would you recommend this same advice?:

Mentions work according to “markdown laws”. That is, it does not expand in a link destination (your example). It doesn’t work inside a link value ([@wooorm](example.com)). What you want is a template. As I mentioned before, they aren’t markdown, but compile to markdown, so proper handlebars won't work.