signalapp / Signal-Desktop

A private messenger for Windows, macOS, and Linux.
https://signal.org/download
GNU Affero General Public License v3.0
14.63k stars 2.66k forks source link

Urls with special characters (like ÆØÅ) don't become clickable links with preview #4810

Open goibon opened 3 years ago

goibon commented 3 years ago

Bug Description

When pasting a url (e.g. https://www.reddit.com/r/Denmark/comments/kwc0jz/æble_eller_pære/) the url doesn't become a clickable link along with a preview of the url. If I remove the two occurrences of the letter 'Æ' (which thankfully is still a valid url for the Reddit post) and then make another change then a preview appears. If I send a message with the original url then that message contains a clickable link in the iOS app but not on desktop. If I paste the original url in the iOS app then it generates a preview and a clickable link just fine. So it seems the desktop version has a slightly different way of handling url detection. I've attached a brief screen recording demonstrating the issue:

https://user-images.githubusercontent.com/2376777/104436373-cfccfe80-558d-11eb-8b21-f2300328738e.mov

Steps to Reproduce

  1. Past a url containing special characters, in my case: https://www.reddit.com/r/Denmark/comments/kwc0jz/æble_eller_pære/ in the message input field

Actual Result:

The url remained plain text, e.g. not a clickable link, and no preview was produced.

Expected Result:

I expected the url to be transformed into a clickable link and a preview to be attached to my message.

Screenshots

Platform Info

Signal Version: v1.39.4

Operating System: macOS Catalina 10.15.7

Linked Device Version: 5.0.3.0 (iOS)

Link to Debug Log

https://debuglogs.org/c367cd3a70c3af7567d28c9f9df47c42ca8e85ef7faa1631760d5b1379656628

EvanHahn-Signal commented 3 years ago

Thanks for reporting. I'm not sure what we should do here.

We have some code that ensures that all character's in a URL's path are valid characters based on the URI standard. æ is not one of those characters, so we don't show a link preview.

Both Firefox and Chrome copy the URL like this:

https://www.reddit.com/r/Denmark/comments/kwc0jz/%C3%A6ble_eller_p%C3%A6re/

All of these characters are valid, so we show a link preview. But Reddit's "Copy Link" behavior copies it with those invalid characters.

To fix this, our options are:

  1. Make our link preview checking more lenient. We would allow characters like æ in the URL, even though those are not technically valid. This might open up some other security issues which we'd need to evaluate.
  2. Update the mobile apps to be consistent with Desktop (and the spec), rejecting URLs with characters like æ.
  3. Do nothing, and accept that the apps are slightly different here.

Not sure what to do here, but I've filed this as a bug. We'll think about it.

bentolor commented 3 years ago

I just stumbled over the same issue here with a Am*zon URL containing a german umlaut ( 'ä' ) in the URL.

I swear, i directly copy & pasted the URL from Firefox stable into Signal. After @EvanHahn-Signal comment, I tried to reproduce: But now the URL got quoted correctly.

It seems Firefox only quotes the URL if you copy the complete URL. If you (like me), only select parts of the URL (like up to the /dp/xxxxxxx, part) Firefox no longer quotes preceding special characters of the partial URL.

goibon commented 3 years ago

@bentolor that is an interesting find so I tried it myself and here's what I found:

All of these tests were performed on macOS Catalina 10.15.7

Safari (14.0.3)

Firefox (85.0.2)

Chrome (88.0.4324.150)

So it seems that Safari is the only browser of the three that doesn't encode the url when copying 🤔

SeriousMatters commented 3 years ago

So it seems that Safari is the only browser of the three that doesn't encode the url when copying 🤔

On windows 10, Firefox (89.0)

Chrome (91.0)

Opera (76.0)

Edge (91.0)

SeriousMatters commented 3 years ago

We have some code that ensures that all character's in a URL's path are valid characters based on the URI standard. æ is not one of those characters, so we don't show a link preview.

@EvanHahn-Signal , is there a reason to enforce pure URI standards while most messengers apps, broswers, and major websites supports UTF-8 urls?

henkka-fi commented 2 years ago

@EvanHahn-Signal , is there a reason to enforce pure URI standards while most messengers apps, broswers, and major websites supports UTF-8 urls?

Also wondering this. Most modern messaging apps and browsers support UTF-8 URLs and also Signal mobile app works correctly with these. Why is the strict URI standard enforced only on Signal desktop when UTF-8 URLs work basically everywhere else?

ghost commented 1 year ago

Any update regarding this, by the way? It's still present in 6.x versions.

habi commented 9 months ago

The issue still persists in Signal 6.58.0.7 on iOS.

davidhaberthür.ch is not linked, davidhaberthuer.ch is.

IMG_6841

nehemiagurl commented 9 months ago

@habi you should probably open a ticket in the iOS repo for this one, as it's not Desktop

bentolor commented 9 months ago

@habi I would argue, that in your case of the domain name this is a good thing and should be kept by design to mitigate https://en.wikipedia.org/wiki/IDN_homograph_attack

On another note: What is http? I know about the secure hypertext transfer protocol https. I suspect http might be some obscure IBM mainframe legacy technology from the last century, no?! 😸

nehemiagurl commented 9 months ago

@bentolor the way to stop IDN homograph attacks is with Punycode, not by blocking support for all non-ascii characters in urls. people outside the anglosphere exist.

also stop being obnoxious about https. the person was just demonstrating behaviour in the client, not sending nuclear launch codes. http is perfectly fine for that.

bentolor commented 9 months ago

@nehemiagurl Please refrain from your toxic behaviour.

nehemiagurl commented 9 months ago

@bentolor the one shaming people unnecessarily and dismissing their concerns is you.

bentolor commented 9 months ago

Dear @nehemiagurl

  1. I'm providing & giving feedback on why the case demonstrated might not fit to the topic of this ticket and explain the reasoning behind it.
  2. My comment on https was a humorous comment explicitly marked as joke with a smiley. Obviously I failed to add enough emojis to enable everybody to recognize it as a joke. For my failure I was rewarded your downvote(s) and minutes later a follow-up attacking me.
  3. You claim Punycode would be the way to fix this. I'd say this is just wrong, as this would imply that Signal modifies the users input. And altering the user content and input is something that Signal definitely shouldn't do at all.
  4. It's pointless to attack people in the internet on trivial jokes and comments. The result is solely both of us having a bad day right now…
habi commented 9 months ago

@habi you should probably open a ticket in the iOS repo for this one, as it's not Desktop

In the iOS repository, these issues about IDN already exist:

https://github.com/signalapp/Signal-iOS/issues/5543 links to https://github.com/signalapp/libsignal/issues/511, which is related to https://github.com/signalapp/Signal-Desktop/issues/5237 which is closed as a duplicate of this issue here.

habi commented 9 months ago

@habi I would argue, that in your case of the domain name this is a good thing and should be kept by design to mitigate https://en.wikipedia.org/wiki/IDN_homograph_attack

You know, there are people with an Umlaut in their name, which would actually profit from having their personal URL linked in a software they like to use.

iMessage links my URL, Threema does, Element does. I cannot test WhatsApp, as I don't have an account with them anymore.

habi commented 9 months ago

On another note: What is http? I know about the secure hypertext transfer protocol https. I suspect http might be some obscure IBM mainframe legacy technology from the last century, no?! 😸

I tried to 'minimize' the issue and copy-pasted several versions of my URL. I personally think this part of your comment is unnecessary for the issue. I also think it's very hard to balance humorous text with emojis, but don't think it's necessary to talk about this part more here.

bentolor commented 9 months ago

You know, there are people with an Umlaut in their name, which would actually profit from having their personal URL linked in a software they like to use.

I know: I guess quite everybody in this issue is here, because they wanted to share a URL containing some innocent, local characters like Umlaut.

My point is: I think there is a significant difference in the security impact of deploying a homoglyph attack as part of the URL path vs. the domain name.

iMessage links my URL, Threema does, Element does. I cannot test WhatsApp, as I don't have an account with them anymore.

I'm not sure that "the others do" is a good reasoning. I think in the case of the domain names, it's really a trade off between security and "least suprise of the user".

Are we on the same page, that rendering homoglyphs in domain names imposes a significant security thread for the users? As an Signal user: How would you be able to understand that the message from you colleague (whose phone has been stolen) asking you to change your password on https://account.mᎥcrosoft.com/ instead of https://account.microsoft.com/ is a fraud?

bentolor commented 9 months ago

Reading through the linked issues: The general gist here is that currently the behavior is confusing, as some special characters do get linkified on some platforms, others don't.

Ok: That's definitely confusing and not helpful. Signal should decide and take forward either one of both ways: Rendering IDNs or not rendering them at all.

nehemiagurl commented 9 months ago

How would you be able to understand that the message from you colleague (whose phone has been stolen) asking you to change your password on https://account.mᎥcrosoft.com/ instead of https://account.microsoft.com/ is a fraud?

if the person you're chatting with is an adversary trying to phish you via Signal, you've got bigger problems on your hand. protecting from homograph attacks is the job of the browser - just like Signal can't protect you from sending incriminating messages without massively degrading the app (and even then a more sophisticated adversary can go around that anyway), Signal can't protect you from accepting a message request from someone who's phishing you.

even if they say "screw you" to anyone who uses the internet outside of the Anglosphere and block IDNs from rendering, a phisher can always just send the link in a separate message and wait for the victim to copy-paste it in their browser. do you really think that if someone who's unsuspecting would look at a series of messages like:

Hi, you need to urgently change your Microsoft password. Go to "my account" in this link and sign in, then configure a new password:

https://account.mᎥcrosoft.com/

they will see that the second link isn't clickable and immediately understand they're being phished?

most browsers implement Punycode conversion by default, as well as some other protections, and that's excellent, because that's the way to actually combat this. but you won't see browsers blocking IDNs altogether, and neither should Signal.

bentolor commented 9 months ago

Thank you, @habi and @nehemiagurl !

As I mentioned: It's a tradeoff. Now I see the merits in of both point of views and would be ok with both approaches, as long as the behaviour is consistent across Signal platforms and applications.

habi commented 9 months ago

Please refer to https://davidhaberthŭr.ch/ for details!

Depending how you share this link in an iMessage, this is the result

image

SeriousMatters commented 9 months ago

most browsers implement Punycode conversion by default, as well as some other protections, and that's excellent, because that's the way to actually combat this. but you won't see browsers blocking IDNs altogether, and neither should Signal.

Exactly! We don't forbid the sale of all knifes and all alcohol just because it can potentially enable criminal activity. Besides, people still fall into scams even if url is nothing like the original.

How about convert to punycode automatically on paste or on send?

Nowaker commented 6 months ago

This!

image