markdown-it / linkify-it

Links recognition library with full unicode support
http://markdown-it.github.io/linkify-it/
MIT License
661 stars 63 forks source link

Links with slashes capturing trailing semicolons #98

Closed georgediaz88 closed 3 years ago

georgediaz88 commented 3 years ago

Hi,

Using the demo site, I noticed links with slashes capturing trailing semicolons. This works well for trailing periods but not for trailing semicolons as I expected. Here are some examples:

  1. Trailing period after slash. Works as expected:
www.google.com/.

=> www.google.com/
  1. Trailing semicolon, no slash. Works as expected:
www.google.com;

=> www.google.com
  1. Trailing period after slash. Works as expected:
www.google.com/.

=> www.google.com/
  1. Semicolon after slash. Doesn't work as expected:
www.google.com/;

=> www.google.com/;

(I would've expected it to capture like `www.google.com/`)

Is the last example above by design? There are some academic disciplines that separate a list of urls with semicolons. So, that's the behavior we're trying to match with this plugin.

Your help would be greatly appreciated.

Thanks! George

puzrin commented 3 years ago

Is the last example above by design? There are some academic disciplines that separate a list of urls with semicolons. So, that's the behavior we're trying to match with this plugin.

Euristic algorythms can not guarantee right result for 100%.

; is valid char for URL. I see 2 alternatives:

Also, please provide real world examples, where ; is used as you describe (several samles of documents). That should help to invent proper rule.

PS. At first glance, "; can not be part of link, if followed with space" - may work.

georgediaz88 commented 3 years ago

Hi @puzrin,

Thanks for your quick response.

Regarding:

Try to provide formal description how to decide when ; should not be part of url, without side-effects.

The way it could work would be just like you noted, exactly: "PS. At first glance, "; can not be part of link, if followed with space" - may work."

Here are some real world examples:

See Nathan Bomey & Marco della Cava, Sexual Harassment Went Unchecked for Decades as Payouts Silenced Accusers, USA Today (Dec. 1, 2017), https://www.usatoday.com/story/money/business/2017/12/01/sexual-harassment-went-unchecked-decades-payouts-silenced-accusers/881070001/; Lyn Yonack, Sexual Assault Is About Power: How #MeToo Campaign Is Restoring Power to Victims, Psychol. Today (Nov. 14, 2017), https://www.psychologytoday.com/us/blog/psychoanalysis-unplugged/201711/sexual-assault-is-about-power.

See Carl Hulse, Political Polarization Takes Hold of the Supreme Court, N.Y. Times (July 5, 2018), https://www.nytimes.com/2018/07/05/us/politics/political-polarization-supreme-court.html; Kevin Schaul & Kevin Uhrmacher, Analysis: How Trump Is Shifting the Most Important Courts in the Country, Wash. Post (Sept. 4, 2018),

The role that per curiam decisions do and should play, particularly when the Court does not speak with a unified voice, is quite interesting and a topic for further exploration. See https://www.theatlantic.com/ideas/archive/2018/06/the-court-slices-a-narrow-ruling-out-of-masterpiece-cakeshop/561986/; Ira P. Robbins, Hiding Behind the Cloak of Invisibility: The Supreme Court and Per Curiam Opinions, 86

See David Orr, Poets, Academia: A Couplet in Conflict, N.Y. Times (May 30, 2009), https://www.nytimes.com/2009/05/31/weekinreview/31orr.html; Steven L. Winter, Death Is the Mother of Metaphor, 105 Harv. L. Rev. 745, 749--50 (1992) (describing poet Wallace Stevens's relationship to the study of law and legal language).

You'll also see Github excludes the trailing semicolon from these links as expected.

Currently, our users end up correcting links like these in our markdown by setting a markdown link with everything in the link excluding the trailing semicolon. Of course, it would be best if this library handled this scenario to solve that pain point.

Let me know what you think.

Thanks! George

puzrin commented 3 years ago

Try v3.0.3

georgediaz88 commented 3 years ago

@puzrin, whoa this is awesome!! 🎉

IT WORKS!

I appreciate you quickly adding this for me / my team. Thank you!