Closed JoshyPHP closed 8 years ago
@dcsjapan If you encounter real-world examples of non-ASCII punctuation that interferes with links feel free to reopen this issue by posting them here.
I'll keep an eye out for examples. It's hard to actively search for these things, because Google and the like tend to either ignore fullwidth symbols or lump them together with their halfwidth counterparts.
I am wondering if simply excluding fullwidth punctuation from links is a complete solution, though. It's possible there may be URLs which include such characters. I noticed Wikipedia has chosen to avoid the problem by insisting on halfwidth parentheses for disambiguation, e.g.
https://ja.wikipedia.org/wiki/サクラ_(曖昧さ回避)
https://zh.wikipedia.org/wiki/道_(消歧义)
But when I wondered whether they could've used fullwidth parens instead ... just limiting myself to Japanese for the sake of simplicity ... all I come up with is the short list of halfwidth symbols that can't be used in URLs:
\ ' | ` ^ " < > ) ( } { ] [
So if we put Wikipedia's conventions aside for the moment, it seems that URLs such as the following should in theory be possible:
https://ja.wikipedia.org/wiki/サクラ(曖昧さ回避)
https://zh.wikipedia.org/wiki/道(消歧义)
And indeed, Wikipedia treats those as valid (albeit empty) pages.
That being the case, you'd need to test whether the "(
" corresponding to each ")
" came before or after the "http", or you run the risk of truncating the link. Likewise for other types of brackets. What's worse, you could presumably also have URLs including things like fullwidth commas, full stops, question marks, and exclamation points ... not to mention a wide variety of Eastern emoticons ... and there'd be no way to tell whether a given symbol marks the end of the URL or not.
Related to this: https://github.com/flarum/core/issues/1041 Also this: https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms
Possible test case: