Open tiff opened 2 years ago
Another case
https://webtranslateit.com/en/projects/19484-Website-languagetool-org/locales/en..de/strings/21631111
The problem in the first case is that the sentence detection is already wrong, it adds a sentence boundary at ?
. Might be an easy fix by extending the character set here in segment.srx
(I don't have time to work on it now, though):
<rule break="no"><!-- URLs without "www."-->
<beforebreak>\b(https?|ftp|file|chrome|chromium|android|(chrome|moz)\-extension):///?[A-Za-z0-9\-]+\.</beforebreak>
<afterbreak>[A-Za-z0-9\-]+(\.|\b)</afterbreak>
</rule>
Taking advantage of this issue, here are a few more cases:
http://foo.com/blah_blah_(wikipedia)_(again)
http://✪df.ws/123
http://➡.ws/䨹
http://⌘.ws
http://⌘.ws/
http://foo.com/unicode_(✪)_in_parens
http://foo.com/(something)?after=parens
https://i❤.ws/emojidomain/emoj💥i
This URL is not detected as one: