michaelherold / pyIsEmail

Simple, robust email validation
http://michaelherold.github.io/pyIsEmail
MIT License
126 stars 14 forks source link

Add SMTP validator #7

Open jayvdb opened 7 years ago

jayvdb commented 7 years ago

It would be nice if there was a validator which confirms that the smtp server confirms the email address is valid. https://github.com/syrusakbary/validate_email/issues/61

michaelherold commented 7 years ago

To the best of my knowledge, there isn't a way to do this. SMTP has the VRFY verb, but that is usually disabled because it leaks information to spammers. Having a public SMTP server that authoritatively says "yes, this address exist" only opens them to receiving more spam messages.

I played around with talking to an SMTP server that I know information about. Here's the result:

>>> import smtplib
>>> server = smtplib.SMTP('aspmx.l.google.com')
>>> server.set_debuglevel(1)
>>> server.ehlo_or_helo_if_needed()
send: 'ehlo redacted-server-name.com\r\n'
reply: '250-mx.google.com at your service, [2400:8901::f03c:91ff:fe60:16de]\r\n'
reply: '250-SIZE 157286400\r\n'
reply: '250-8BITMIME\r\n'
reply: '250-STARTTLS\r\n'
reply: '250-ENHANCEDSTATUSCODES\r\n'
reply: '250-PIPELINING\r\n'
reply: '250-CHUNKING\r\n'
reply: '250 SMTPUTF8\r\n'
reply: retcode (250); Msg: mx.google.com at your service, [2400:8901::f03c:91ff:fe60:16de]
SIZE 157286400
8BITMIME
STARTTLS
ENHANCEDSTATUSCODES
PIPELINING
CHUNKING
SMTPUTF8

# Verifying good email with VRFY
>>> server.verify('email.that.i.know.exists@redacted.com')
send: 'vrfy email.that.i.know.exists@redacted.com\r\n'
reply: "252 2.1.5 Send some mail, I'll try my best e187si2388319pfa.41 - gsmtp\r\n"
reply: retcode (252); Msg: 2.1.5 Send some mail, I'll try my best e187si2388319pfa.41 - gsmtp
(252, "2.1.5 Send some mail, I'll try my best e187si2388319pfa.41 - gsmtp")

# Verifying bad email with VRFY
>>> server.verify('email.that.i.know.does.not.exists@redacted.com')
send: 'vrfy email.that.i.know.does.not.exists@redacted.com\r\n'
reply: "252 2.1.5 Send some mail, I'll try my best e187si2388319pfa.41 - gsmtp\r\n"
reply: retcode (252); Msg: 2.1.5 Send some mail, I'll try my best e187si2388319pfa.41 - gsmtp
(252, "2.1.5 Send some mail, I'll try my best e187si2388319pfa.41 - gsmtp")

# Verifying good email with RCPT
>>> server.docmd('mail from: <email.that.i.know.exists@redacted.com>')
send: 'mail from: <email.that.i.know.exists@redacted.com>\r\n'
reply: '250 2.1.0 OK e187si2388319pfa.41 - gsmtp\r\n'
reply: retcode (250); Msg: 2.1.0 OK e187si2388319pfa.41 - gsmtp
(250, '2.1.0 OK e187si2388319pfa.41 - gsmtp')
>>> server.docmd('rcpt to: <email.that.i.know.exists@redacted.com>')
send: 'rcpt to: <email.that.i.know.exists@redacted.com>\r\n'
reply: '250 2.1.5 OK e187si2388319pfa.41 - gsmtp\r\n'
reply: retcode (250); Msg: 2.1.5 OK e187si2388319pfa.41 - gsmtp
(250, '2.1.5 OK e187si2388319pfa.41 - gsmtp')

# Verifying bad email with RCPT
>>> server = smtplib.SMTP('aspmx.l.google.com')
>>> server.set_debuglevel(1)
>>> server.ehlo_or_helo_if_needed()
send: 'ehlo fw072899.members.linode.com\r\n'
reply: '250-mx.google.com at your service, [2400:8901::f03c:91ff:fe60:16de]\r\n'
reply: '250-SIZE 157286400\r\n'
reply: '250-8BITMIME\r\n'
reply: '250-STARTTLS\r\n'
reply: '250-ENHANCEDSTATUSCODES\r\n'
reply: '250-PIPELINING\r\n'
reply: '250-CHUNKING\r\n'
reply: '250 SMTPUTF8\r\n'
reply: retcode (250); Msg: mx.google.com at your service, [2400:8901::f03c:91ff:fe60:16de]
SIZE 157286400
8BITMIME
STARTTLS
ENHANCEDSTATUSCODES
PIPELINING
CHUNKING
SMTPUTF8
>>> server.docmd('mail from: <email.that.i.know.exists@redacted.com>')
send: 'mail from: <email.that.i.know.exists@redacted.com>\r\n'
reply: '250 2.1.0 OK g13si415094plj.507 - gsmtp\r\n'
reply: retcode (250); Msg: 2.1.0 OK g13si415094plj.507 - gsmtp
(250, '2.1.0 OK g13si415094plj.507 - gsmtp')
>>> server.docmd('rcpt to: <email.that.i.know.does.not.exists@redacted.com>')
send: 'rcpt to: <email.that.i.know.does.not.exists@redacted.com>\r\n'
reply: '250 2.1.5 OK g13si415094plj.507 - gsmtp\r\n'
reply: retcode (250); Msg: 2.1.5 OK g13si415094plj.507 - gsmtp
(250, '2.1.5 OK g13si415094plj.507 - gsmtp')
>>>

You can see that using both methods (VRFY and RCPT check) say that both the good and bad email addresses exist and say it in an identical manner. That leads me to believe that there isn't a way to know whether the email address exists without sending an email and receiving a bounce.

I'm more than happy to add this capability if someone knows a way to do this. I currently don't think it's possible, though.

jace commented 4 years ago

I use pyIsEmail in mxsniff. It implements email address verification by attempting to send an email and aborting mid-way.

In my experience, this only works in a supervised environment. Many SMTP servers will do a reverse lookup to confirm you're accessible to them, and (I suspect) some start aborting if they see a pattern in the behaviour. I wrote about it here.

michaelherold commented 4 years ago

That's interesting! I am open to accepting a PR if you are interested in upstreaming that check into pyIsEmail.

jace commented 4 years ago

I noticed IDNA 2008 support is missing. In addition, RFC 6530 adds non-ASCII characters to the mailbox. I'll look into adding both.

RoyTrudell commented 4 years ago

I noticed IDNA 2008 support is missing. In addition, RFC 6530 adds non-ASCII characters to the mailbox. I'll look into adding both.

@jace any update on this? I can create a fork & help out if needed. To allow non-ASCII characters we simply need to open up the len ord(token) check to not be < 33 or >126

jace commented 4 years ago

@RoyTrudell Sorry, no update. Been bogged down with other work.

IDNA has limitations on allowed characters. There's the idna library to help with this. One cheap hack for now is to extract and encode the domain into punycode before passing the email address to pyIsEmail.

jace commented 4 years ago

On the response to RCPT @michaelherold noted earlier, what I've learnt is this: accepting an email to a known bad recipient and then bouncing it tends to work badly for the mail server because the bounce could also bounce ("backscatter"). Qmail is the only popular mail server that does this by design, but is typically deployed with patches.

I guess the variance then is from whether the mail server (or its firewall) guesses this to be a genuine delivery or a probe.

michaelherold commented 2 years ago

I learned about https://github.com/truemail-rb/truemail, which is a Ruby library that has an SMTP validator. We could learn something from it.