Bulk Validation & IP blacklist

marcelinhov2 commented 4 years ago

Hello @amaurymartiny, we are running your solution using Kubernetes but I'm getting problems with blacklist IPs.

I check that so many solutions have the bulk validation feature and I was think how can it be done without this blacklist problem. Do you have any tips of it?

I thought that could be proxies, but I don't know if it works and make sense.

Thanks.

amaury1093 commented 4 years ago

I check that so many solutions have the bulk validation feature and I was think how can it be done without this blacklist problem

Most of the solutions out there already have a database of emails and their deliverability, so their bulk validation feature just hits the database for most of the emails you want to check.

To use this tool without having the blacklist problem, the only solution I can think of is IP rotation. So yeah, proxies as you said, but a large amount of them with IP rotation.

I thought I could get around with it with AWS's Lambda (each serverless function has a different IP address), but empirically it doesn't work so well: I tried on https://reacherhq.github.io/, the success rate is not very high.

I'll leave this issue open, if anyone else has ideas, I would like to hear.

marcelinhov2 commented 4 years ago

Thanks for your answer @amaurymartiny. Do you know how I implement a proxy rotation in front of your services? I'm already using this ip rotation with my crawlers solutions but idk how to use it with your service, it is ready for this?

Thanks.

marcelinhov2 commented 4 years ago

If not, do you know a safe range of requests per minute (maybe per hour, idk) that I can make without having my ip blocked?

Thanks again.

marcelinhov2 commented 4 years ago

About your lambda solution, you only have high volume of IPs if you make parallel requests. For example, If you hit your lambda 50 times at the same it gonna up 50 threads for you. If you make more 50 requests after that, it gonna use the same 50 ips that was used before.

To make what you want work, you need to make batches until the ips going to the blacklist. When it happens, you need to wait 15 minutes (for the lambda die) and start again.

If you have a serverless version of your solution I can run some tests to give you feedbacks about it.

Once again, thanks for your help.

amaury1093 commented 4 years ago

Thanks for researching into this!

Do you know how I implement a proxy rotation in front of your services?

I haven't looked into this myself. But I think it shouldn't be different from IP rotation in front of other services (e.g. your web crawler).

If not, do you know a safe range of requests per minute (maybe per hour, idk) that I can make without having my ip blocked?

This depends on the email provider you're testing the email against. MS Outlook blocked me after 3 email validations, Gmail seems a bit more permissive but I haven't tested deeply.

If you have a serverless version of your solution I can run some tests to give you feedbacks about it.

Here's how I set up AWS lambda: https://github.com/reacherhq/microservices.

I currently don't have much time myself to look into IP rotation, so if you find something, I would gladly appreciate some reporting back here 🙏 !

marcelinhov2 commented 4 years ago

The serverless version is giving me a timeout. Do you know if I need to configure something?

Thanks.

marcelinhov2 commented 4 years ago

Even localhost :(

amaury1093 commented 4 years ago

it's often normal to get timeouts on localhost, because your ISP block requests on port 25.

When I said above "but empirically it doesn't work so well", that's what I meant on AWS Lambda. I do get some of the requests that pass. So I guess it's something related to their infrastructure: some serverless functions get port 25 blocked, others don't.

marcelinhov2 commented 4 years ago

And what about build a layer for proxy rotation?

amaury1093 commented 4 years ago

That would be a sweet idea!

However, it's out of scope for this tool. I'm okay to add a --proxy flag to the binary, so that all the SMTP requests/responses go through that proxy first. But I personally will not build the IP rotation proxy itself.

There might already be some other tools available for this, if you find something I'd really like to know!

marcelinhov2 commented 4 years ago

Hey @amaurymartiny, how are you going with all this Covid-19 thing? I hope you are doing well...

We keep going with our tests here and we found a new problem that maybe you can help us:

We are getting this Helo command rejected in some domains that we are testing emails. I found this 2 links, idk if it can help in any way:

https://unix.stackexchange.com/questions/91749/helo-command-rejected-need-fully-qualified-hostname-error

https://forums.zimbra.org/viewtopic.php?t=18646

Do you think that it is a problem that we can handle?

Thanks.

taewookim commented 4 years ago

connecting to a SMTP to validate.. this requires that the IP you're connecting from (to the SMTP server) has

1) open SMTP server on port 25 2) is NOT blacklisted via spamhaus

is this correct?

If so, wouldn't it be easier to just set up VPS (with open port 25) with smtp server, and round robbin those servers when you're trying to verify different variations of email?

marcelinhov2 commented 4 years ago

Hey @taewookim,

We already have port 25 opened at our side but it's kind of impossible to not be blacklisted when trying to validate a batch of emails. I'm still trying to understand how can I do this like zerobounce and thechecker.co does.

To be honest I didn't try the VPS approach yet, but for sure I will.

Thanks again.

amaury1093 commented 4 years ago

@marcelinhov2 Thanks, I'm all good. I hope you are safe & healthy too.

We are getting this Helo command rejected in some domains that we are testing emails. I found this 2 links, idk if it can help in any way:

I just published 0.7.0 on Docker and on the Releases page. The binary takes a --hello-name, and the HTTP server takes a hello_name field in the JSON input. This field is used in the EHLO smtp command. Put something that is a FQDN, and your error should go away.

Note: I just did some quick testing, and published this 10min ago, so there might be bugs (hope on though). I'll do some more thorough testing on my side.

marcelinhov2 commented 4 years ago

Great @amaurymartiny.

We are going to test this today and I give you a feedback.

Thank you so much

marcelinhov2 commented 4 years ago

Worked perfectly

nikos90 commented 4 years ago

Maybe a combination with this one https://github.com/mattes/rotating-proxy will do the trick for IP blacklist bypass

amaury1093 commented 3 years ago

@nikos90 The problem with Tor is that a lot of SMTP servers block Tor exit nodes. Even if you rotate IPs within Tor, they will still get blacklisted at the exit.

taewookim commented 3 years ago

that's right. most proxies / tor exit nodes are already blacklisted. dont bother.

@amaurymartiny

I've been running distributed servers on low end VPSs for this kinda thing. It's a pain in the arse to maintain but might be a possibility to create a service that takes care of this type of stuff for someone interested in distributed IPs for checking tons of emails. Let me know if you wanna collab.

zoid007 commented 3 years ago

I've a subscription of a hosting provider that allow me to send unlimited email and they have multiple smtps with different IPs and good reputation so can I use this script with that external smtp? I guess that will fix the issue for me

amaury1093 commented 3 years ago

You would need to proxy the requests through the external smtp, see the --proxy-* flags on the binary.

BTW, would you mind sharing which hosting provider you use that have good reputation SMTP servers?

Lusitaniae commented 3 years ago

Lambda doesn't solve the issue

Lambda doesn't have a public IP address (it's using NAT)

Each AWS account deploys lambda containers in a group of dedicated EC2 instances.

So basically all your lambdas are running in a few EC2 which all connect to the same NAT instance

See more at https://stackoverflow.com/a/37793338/634577

arimgibson commented 2 years ago

Sorry to revive this dead issue... just want to make sure I'm understanding something right.

To be able to proxy, every single proxy server would have to have port 25 open, correct? Does anyone have any clue how one would go about this? I've had difficulty finding any proxies that have port 25 open, let alone that aren't already blocked.

I had bought a VPS but I'm guessing the IP has been used by someone else before. Got just a couple verifications in before Spamhaus blacklisted it.

bahout commented 2 years ago

I've a subscription of a hosting provider that allow me to send unlimited email and they have multiple smtps with different IPs and good reputation so can I use this script with that external smtp? I guess that will fix the issue for me

Hi @zoid007 could you share the hosting provider you used?

marcelinhov2 commented 2 years ago

In my case, I need to ask to AWS support team.

JoshuaAGE commented 1 year ago

@arimgibson

I had bought a VPS but I'm guessing the IP has been used by someone else before. Got just a couple verifications in before Spamhaus blacklisted it.

This is because the data you are checking are spam traps.

arimgibson commented 1 year ago

@arimgibson

I had bought a VPS but I'm guessing the IP has been used by someone else before. Got just a couple verifications in before Spamhaus blacklisted it.

This is because the data you are checking are spam traps.

@JoshuaAGE I'd be surprised; it's from an email list collected through website sign ups from a forum-type site. I have user first and last names as well. Not just a list I downloaded/bought

JoshuaAGE commented 1 year ago

@arimgibson The reason why you get listed at Spamhaus - probably CSS and not SBL - is 100% based on your data. Especially forum signups draw spam trap signups and Spamhaus spam trap feed providers buy a lot of old domains and convert them into spam traps.

arimgibson commented 1 year ago

That makes sense and sounds right @JoshuaAGE; appreciate the input! Unfortunate because that makes my life a lot harder haha. A good number of the emails are from smaller email providers or individual/small company's domains. I suppose that's what I get for using data from literal decades ago :stuck_out_tongue_winking_eye:

JoshuaAGE commented 1 year ago

@arimgibson Just filter them out... not easy, I know.

arimgibson commented 1 year ago

I ended up going with a paid third-party verifier; while not ideal cost-wise, it just ended up being easier to not have to worry about the effects of being caught by spam filters :sweat_smile:

0xlinus commented 11 months ago

@amaury1729 Hey! I wonder how is the bulk validation process. Let say that you have:

"input": [
     "email1@domain1.com",
     "email2@domain1.com",
     "email3@domain1.com",
     "another_email1@domain2.com",
     "another_email2@domain2.com"
]

how many SMTP connections are created? One per domain? Following the example, this validation would create only 2 connections, right? with domain1 and domain2 SMTP server?

I've tried to run Bulk validation myself and listen to outgoing SMTP connections, but I can't establish a connection with my local rabbitmq server for some reason :(

I'm asking this bc if ONE connection is created per email, then it's pretty much guaranteed a blacklist/ban.

Thanks! Awesome project!

amaury1093 commented 11 months ago

I'm asking this bc if ONE connection is created per email, then it's pretty much guaranteed a blacklist/ban.

It's currenctly like this.

I explored using one connection per domain in https://github.com/reacherhq/check-if-email-exists/issues/65, but got some headaches. I think I got errors like "Cannot have multiple commands per connection", so I gave up. If you're willing to explore a bit more, I'll be happy to assist (e.g. with servers with port 25 open).

0xlinus commented 11 months ago

I'm asking this bc if ONE connection is created per email, then it's pretty much guaranteed a blacklist/ban.

It's currenctly like this.

I explored using one connection per domain in #65, but got some headaches. I think I got errors like "Cannot have multiple commands per connection", so I gave up. If you're willing to explore a bit more, I'll be happy to assist (e.g. with servers with port 25 open).

More than happy to do that, but first I need to learn Rust lol

reacherhq / check-if-email-exists

Bulk Validation & IP blacklist #234