Open cycomachead opened 5 years ago
We should sanitize these cases, right?
s/con$/com
is obvious. Misspellings are a bit trickier. Any ideas?
You're right that we should require double entering user emails in the frontend, not allowing pasting into the repeated email field. I'll look into it (filing an issue into the site repo right now).
I think we should sanitize what we can.
I don't think there's anything we can do about user name part - but things like 'gmai.com' and 'yaho.com' can be fixed relatively easily.
That'll leave out all the gmial
s and mgail
s out there, but at least it's something! :)
(We could query for .*@[gmail]\+....\.com
and .*@[yaho]\+....\.com
, at least to find all instances of emails with characters out of order)
Wow. select email from users where email similar to '_*@[gmail]{5}.com' and not email similar to '_*@gmail.com';
returns 331 rows. There are also 18 yahoo misspellings and 9 hotmail ones.
People misspell their email much more than I expected!
Should I correct these right away, or should we notify these users that their email has been found to contain misspellings, then correct it? In case they haven't received their verification email and are wondering why, for example.
I was trying to collect some misspelling in meta base / snap-analytics -- I think we can correct the obvious ones now. If you look at the created date, I think we'll find a lot of those are old users
I've corrected misspellings (same chars and char count, but not in the correct order) in gmail.com, yahoo.com, hotmail.com and also .con→.com.
Here's the queries I've used in case we need to do something similar in the future:
update users set email = regexp_replace(email, '(.*)@.*', '\1@gmail.com') where email similar to '_*@[gmail]{5}.com' and not email similar to '_*@gmail.com';
update users set email = regexp_replace(email, '(.*)@.*', '\1@yahoo.com') where email similar to '_*@[yaho]{5}.com' and not email similar to '_*@yahoo.com';
update users set email = regexp_replace(email, '(.*)@.*', '\1@hotmail.com') where email similar to '_*@[hotmail]{7}.com' and not email similar to '_*@hotmail.com';
update users set email = regexp_replace(email, '(.*)con$', '\1com') where email similar to '_*.con';
Awesome! 😀
That's probably good enough for now. What I think I will do, is when I notice issues I'll try to post the mistakes here, and see if we can later to another batch of corrections.
One recent case with @xyz-isdnet
instead of @xyz-isd.net
. Anything that's not a valid domain/IP is also probably worth flagging.
We have a quite a few users with emails that are misspelled or have typos. (I can guess there's about 1K misspellings of gmail, hotmail and yahoo. ~220 @gmai.com (no l), ~60 @*.con)
This might be handled fine on the front-end, which we could move this there, but just thinking we should somewhere help users input correct email addresses.