Open legoktm opened 9 years ago
It seems like this would be possible. We have some good options for a URL regex. See https://mathiasbynens.be/demo/url-regex
@legoktm do you know if there is a MediaWiki URL regex we can use?
Looking through Parser::replaceExternalLinks(), it appears to use:
> var_dump($wgParser->mExtLinkBracketedRegex);
string(342) "/\[(((?i)bitcoin\:|ftp\:\/\/|ftps\:\/\/|geo\:|git\:\/\/|gopher\:\/\/|http\:\/\/|https\:\/\/|irc\:\/\/|ircs\:\/\/|magnet\:|mailto\:|mms\:\/\/|news\:|nntp\:\/\/|redis\:\/\/|sftp\:\/\/|sip\:|sips\:|sms\:|ssh\:\/\/|svn\:\/\/|tel\:|telnet\:\/\/|urn\:|worldwind\:\/\/|xmpp\:|\/\/)[^][<>"\x00-\x20\x7F\p{Zs}]+)\p{Zs}*([^\]\x00-\x08\x0a-\x1F]*?)\]/Su"
I've added the URL symbol to the wikitext split lexicon in deltas. See https://github.com/halfak/Deltas/commit/40d984d2bcceb5fc4f36b42c350c07810fe1971b
I'll need to do a follow-up change here to pull in wikitext_split from deltas.
Hi!
I was trying to use the persistence code to identify when a specific url was added to an article, but ran into an issue with the wiktiext_split function breaking up urls:
It would be nice if urls were special-cased and kept together.