kichik / email-scraper

Simple Python library to scrape email addresses from HTML
MIT License
21 stars 10 forks source link

Weird/broken emails #8

Open kajto3 opened 2 years ago

kajto3 commented 2 years ago

Sometimes the scrapper returns some weird looking and broken emails that look like this:

Unfortunately, I can't provide exact links from emails were scrapped, because I'm using tons of links (scraping from Google), but they mostly come from Wikipedia I think.

kichik commented 2 years ago

As far as I can tell, technically those are valid email addresses. Though it does seem like we need to limit the local part to 64 bytes. You may want to filter the emails after you get them from the library. Or maybe open a PR for "common email addresses" that adds a flag to disallow slashes and other non-common symbols.