markdown-it / linkify-it

Links recognition library with full unicode support
http://markdown-it.github.io/linkify-it/
MIT License
655 stars 63 forks source link

Disable URI encoding #71

Closed christianbundy closed 5 years ago

christianbundy commented 5 years ago

I have a weird use-case where we link to /%abc and do not have URI-encoding. Here's what match looks like after normalization:

{
  schema: '%',
  index: 0,
  lastIndex: 52,
  raw: '%f89BLtbzADRFO5FaC4ktwzyI0FjCTjKtnzJBzKW5rio=.sha256',
  text: '%f89BLtb',
  url: '%f89BLtbzADRFO5FaC4ktwzyI0FjCTjKtnzJBzKW5rio=.sha256'
}

Unfortunately text is being mangled and %f8 is being URI-decoded, whereas I'd like that to be a string literal. If we replace the % with %25 then the url becomes mangled, and the % is being URI-encoded as %25 as well. Is there a way to disable the codec behavior?


To be clear, this is being rendered as:

<a href="%f89BLtbzADRFO5FaC4ktwzyI0FjCTjKtnzJBzKW5rio=.sha256">�9BLtb</a>

When we fix the text property it seems to be affecting the url property, because then it comes out as:

<a href="%25f89BLtbzADRFO5FaC4ktwzyI0FjCTjKtnzJBzKW5rio=.sha256">%f89BLtb</a>

The desired behavior is:

<a href="%f89BLtbzADRFO5FaC4ktwzyI0FjCTjKtnzJBzKW5rio=.sha256">%f89BLtb</a>

Our code is:

['@', '%', '&'].forEach(sigil => {
    md.linkify.add(sigil, {
      validate: function (text, pos, self) {
        var tail = text.slice(pos)

        if (!self.re.sigil) {
          self.re.sigil = new RegExp(
            '^([a-zA-Z0-9+/=]{44}.[a-z0-9]+)'
          )
        }
        if (self.re.sigil.test(tail)) {
          return tail.match(self.re.sigil)[0].length
        }
        return 0
      },
      normalize: function (match) {
        // shorten the link to 7 characters plus the sigil
        match.text = match.text.slice(0, 8)
        // linkify is percent-decoding, so we percent-encode the percent symbol
        match.text = match.text.replace(/^%/, '%25')
        match.url = config.toUrl(match.raw)
        console.log(match)
      }
    })
  })
puzrin commented 5 years ago

I'd suggest to inspect sources, because there are many possibilities for modifications. The most simple is to override default method

https://github.com/markdown-it/linkify-it/blob/cbc0833d3355dc04122c7a666122a61e58392555/index.js#L610-L625

christianbundy commented 5 years ago

@puzrin Sorry, maybe I didn't edit quickly enough -- I'm using the normalize method, but it seems like there's some interplay between match.text and match.url where modifying the text mangles the URL. I'd like to be able to set match.text = '%25' and have that produce <a>%25</a> rather than <a>%</a>.

puzrin commented 5 years ago

If you think there is some interplay, you could provide a minimal possible executable code sample, as proof of reproducible problem.

This package just search text bounds and add some meta, any post-processing magic is out of scope.

christianbundy commented 5 years ago

Thanks, I think this was just happening in link_open in markdown-it -- I'll post-process after that step is done, thanks a lot!

puzrin commented 5 years ago

https://github.com/markdown-it/markdown-it/blob/master/lib/index.js#L45-L85