Open JsBergbau opened 5 years ago
Parentheses are allowed characters in the path of an http URL. In emails URLs are typically enclosed in angle brackets for this reason, e.g. <https://domain.example/some/path>
.
Ok then the detection shoult work that way if URL begings with opening bracket than the closing bracket should not be considered as part of the URL.
I did do some tests with different mail clients on tricky URLs and post my result here. K9 is the one with black background.
K9 always assums ) to be part of the URL. It even gets trickier if the URL contains brackets.
K9 also fails when the URL contains umlautes (e.g. "ü" in this case), although they can be part of an URL. So I'd rather consider this to be a bug than an enhancement.
I just had a similar issue receiving a link from translatewiki.net containing square brackets which k-9 also didn't parse correctly (square bracket open ends link parsing).
K-9 version: 5.708 (latest from F-Droid)
Screenshot:
URL detection in text is great fun. GitHub could be better, too :)
This is a boring URL: https://domain.example/path Text <https://domain.example/path> Text (https://domain.example/path) Text (https://domain.example/path).
This is a URL containing parentheses: https://domain.example/(path) Text <https://domain.example/(path)> Text (https://domain.example/(path)) Text (https://domain.example/(path)).
This is a URL containing unmatched parentheses: https://domain.example/(path)) Text <https://domain.example/(path))> Text (https://domain.example/(path))) Text (https://domain.example/(path))).
This is a URL ending in a dot: https://domain.example/path. Text <https://domain.example/path.> Text (https://domain.example/path.) Text (https://domain.example/path.).
This is a URL ending in a question mark: https://domain.example/path? Text <https://domain.example/path?> Text (https://domain.example/path?) Text (https://domain.example/path?).
Pull request #4996 will improve detection for URLs wrapped in parentheses and/or ending in punctuation that probably signifies the end of the sentence rather than being part of the URL. Especially when unmatched parentheses are part of the URL things get tricky and reasonable people can disagree on what should be done. I opted to include as much as possible and only remove one closing parenthesis if the URL is preceded by an opening parenthesis.
In the cases where K-9 Mail doesn't detect the whole URL it is technically right. Those characters are not allowed in unencoded form in URLs. However, copying such URLs to the address bar of a browser does the right thing. So we should probably extend the URL detection to also allow such "display URLs".
Sure, URLs get tricky especially when ending with . or ). However, Umlauts (äöüß) can be part of an URL so no need to stop there, there is nothing to guess there.
I can confirm that Umlauts break URL rendering.
This issue not only affects German umlauts, it affects all languages that do not use the Latin alphabet.
I'm subscribed to the daily-image-l@lists.wikimedia.org mailing list. As the images which are chosen to be "Picture of the Day" on Wikimedia Commons are taken at locations and by people around the world, every other day I receive an email containing a link that I cannot click in K9. However, in Thunderbird all links work.
Unfortunately, the web archive of that mailing list doesn't handle encodings correctly. Therefore, example links cannot be taken from there.
Here are some examples of links from the recent weeks in different languages:
Special characters in URLs need to be encoded, otherwise it's not a valid URL. Browsers decode special characters when displaying the URL in the address bar. But when copying the URL to the clipboard, special characters are properly encoded. There's no reason why a "display URL" should end up in the plain text part of an email. If it does, that should be considered a mistake the sender should fix on their side.
Whether we'll add support for display URLs remains to be seen. But it will always be a way to support broken emails, not the right thing to do.
Ask the senders of such emails to fix their code so only properly encoded URLs are included in their emails.
Special characters in URLs need to be encoded, otherwise it's not a valid URL.
Are you sure? I can remember the discussion on Chromium where they didn't want to encode the URLs since they said that they are valid anyway and wrong recognization es the problem of the other side... But I think now they do decode some things, however opinions where different on that and there are still too many URLs with umlauts etc.
This problem also occurs when there is an only text E-Mail, so there is normally no special encoding for URLs. Nevertheless K9-Mail is so helpful to generate that a touchable link, so you don't have to copy the text to open it in your browser.
Just have a look is Github does it here, the Link with brackets in the first post is correctly built and K9-Mail should also use this behaviour.
What's wrong with the email address in this tweet? https://twitter.com/RideDDOT/status/1753109640299385062
K9-Mail builds links incorrectly. Thats very annoying when there is a mail from url watch like "CHANGED: Google (https://www.google.de/maps)" You can click that link but URL won't be found because there is a closing bracket at the end of the URL and that can't work
Expected behavior
Link is correctly build
Actual behavior
A closing bracket is appended to the link
Steps to reproduce
Environment
K-9 Mail version: 5.600 Android version: 8.0.0 Account type (IMAP, POP3, WebDAV/Exchange): IMAP Please take some time to retrieve logs and attach them here: