snap-cloud / snapCloud

Official cloud backend and community site for the Snap! programming language
https://snap.berkeley.edu/
GNU General Public License v3.0
52 stars 28 forks source link

Email Address Validations #159

Open cycomachead opened 5 years ago

cycomachead commented 5 years ago

We have a quite a few users with emails that are misspelled or have typos. (I can guess there's about 1K misspellings of gmail, hotmail and yahoo. ~220 @gmai.com (no l), ~60 @*.con)

This might be handled fine on the front-end, which we could move this there, but just thinking we should somewhere help users input correct email addresses.

bromagosa commented 5 years ago

We should sanitize these cases, right?

s/con$/com is obvious. Misspellings are a bit trickier. Any ideas?

You're right that we should require double entering user emails in the frontend, not allowing pasting into the repeated email field. I'll look into it (filing an issue into the site repo right now).

cycomachead commented 5 years ago

I think we should sanitize what we can.

I don't think there's anything we can do about user name part - but things like 'gmai.com' and 'yaho.com' can be fixed relatively easily.

bromagosa commented 5 years ago

That'll leave out all the gmials and mgails out there, but at least it's something! :)

(We could query for .*@[gmail]\+....\.com and .*@[yaho]\+....\.com, at least to find all instances of emails with characters out of order)

bromagosa commented 5 years ago

Wow. select email from users where email similar to '_*@[gmail]{5}.com' and not email similar to '_*@gmail.com'; returns 331 rows. There are also 18 yahoo misspellings and 9 hotmail ones.

People misspell their email much more than I expected!

Should I correct these right away, or should we notify these users that their email has been found to contain misspellings, then correct it? In case they haven't received their verification email and are wondering why, for example.

cycomachead commented 5 years ago

I was trying to collect some misspelling in meta base / snap-analytics -- I think we can correct the obvious ones now. If you look at the created date, I think we'll find a lot of those are old users

bromagosa commented 5 years ago

I've corrected misspellings (same chars and char count, but not in the correct order) in gmail.com, yahoo.com, hotmail.com and also .con→.com.

Here's the queries I've used in case we need to do something similar in the future:

update users set email = regexp_replace(email, '(.*)@.*', '\1@gmail.com') where email similar to '_*@[gmail]{5}.com' and not email similar to '_*@gmail.com';
update users set email = regexp_replace(email, '(.*)@.*', '\1@yahoo.com') where email similar to '_*@[yaho]{5}.com' and not email similar to '_*@yahoo.com';
update users set email = regexp_replace(email, '(.*)@.*', '\1@hotmail.com') where email similar to '_*@[hotmail]{7}.com' and not email similar to '_*@hotmail.com';
update users set email = regexp_replace(email, '(.*)con$', '\1com') where email similar to '_*.con';
cycomachead commented 5 years ago

Awesome! 😀

That's probably good enough for now. What I think I will do, is when I notice issues I'll try to post the mistakes here, and see if we can later to another batch of corrections.

cycomachead commented 5 years ago

One recent case with @xyz-isdnet instead of @xyz-isd.net. Anything that's not a valid domain/IP is also probably worth flagging.

cycomachead commented 5 years ago

https://snap-analytics.cs10.org/question/156