paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

[Dicussion] Most on-chain identity fields should only allow ASCII. #11820

Closed lamafab closed 2 years ago

lamafab commented 2 years ago

Is there an existing issue?

Experiencing problems? Have you tried our Stack Exchange first?

Description of bug

(This possible security concern was brought forward to us by @shashank-In)

I and my teammate Aditya, found that most of the public user input can allow to conduct homograph attacks. This is very similar to my bug report with title "Homograph attack in https://polkadot.js.org/" on 20th July 2020 which was rewarded and fixed at https://github.com/polkadot-js/apps/pull/3290

PoC:

  1. Go to https://polkadot.js.org/apps/?rpc=wss%3A%2F%2Fwestend-rpc.polkadot.io#/explorer/query/11674153
  2. On the right of "identity.setIdentity" click on elon musk profile and notice the tesla.com domain and email. If you try to open that link in a browser we will notice that is not tesla.com but https://tesla.xn--om-nmc/ where c is replaced with a similar looking homoglyph.

image

Suggested Fix: The fix would be the same as before. When a user uses any character specially in critical areas like email, domain, proposal etc and if it has any Cyrillic letters. The website should convert it into ASCII form just like all modern browsers and applications like slack.

Since we're running the W3F registrar, we know that emojis are quite popular, but mostly for display names. I recommend to mandate an ASCII-only requirement on string fields such as email, web, legal name with the exception of Display Names. This should only apply to new set_identity calls which return an error if the ASCII requirement is violated. If this proposal is accepted, the Polkadot JS UI should ideally clarify to the user any such violations.

Thoughts?

Steps to reproduce

See description above.

xlc commented 2 years ago

That makes no sense. Web address supports unicode characters. Most of the population's legal name cannot be represented with ASCII. The runtime shouldn't enforce anything. It is up to each individual applications that consumes the data should do the normalization and handle ambiguity appropriately.

Shashank-In commented 2 years ago

That makes no sense. Web address supports unicode characters. Most of the population's legal name cannot be represented with ASCII. The runtime shouldn't enforce anything. It is up to each individual applications that consumes the data should do the normalization and handle ambiguity appropriately.

Hi @xlc I partially agree with this. Yes, you are right. Going for only ASCII characters won't work for many people as their names have special characters. But I strongly recommend that all emails and domain names can be converted into punny code characters to make them visible to the users. This was fixed in the past when I reported, or at least a warning or some highlighting can be used wherever there is a Unicode character just like VS code does or Github does.

https://github.com/polkadot-js/apps/pull/3290

xlc commented 2 years ago

It is simply infeasible to perform complicated utf string validation and normalization in wasm runtime. We should just require clients does the work.

koute commented 2 years ago

While that's the easiest kind of a fix I don't think restricting fields to ASCII only (where the data they contain is not inherently ASCII-only) is a great idea. For people primarily using the latin alphabet this might not seem like a big deal, but we need to remember that the world is actually a lot more diverse, and there are billions of people who use non-latin scripts on a daily basis.

There are other ways to solve this without throwing the baby out with the bath water. For example, detect the confusable characters and put a big fat warning for the user. Unicode has also a whole technical report regarding security which is worth a read.

I agree with @xlc that the runtime is not the place to fix this.

lamafab commented 2 years ago

Alright, thanks for the feedback everyone.