yassineaboukir / sublert

Sublert is a security and reconnaissance tool which leverages certificate transparency to automatically monitor new subdomains deployed by specific organizations and issued TLS/SSL certificate.
MIT License
981 stars 172 forks source link

Deduplicate, sort output from cert_database lookup #8

Closed simpsora closed 5 years ago

simpsora commented 5 years ago

cert_database.lookup() uses a list for storing subdomains, but the respose from the service can contain (many) duplicate subdomains. In addition, the subdomain list is not in a deterministic order, so future diffs may be inaccurate.

This PR switches from a list to a set for storing the subdomains, which automatically deduplicates them. It also returns a sorted version, guaranteeing a consistent order for every call.

I tested this using the python.org domain, before and after applying the code in this PR:

$ wc -l python.org.txt.before python.org.txt.after
    1696 python.org.txt.before
      40 python.org.txt.after

You can see there are a lot of duplicates for this domain. paypal.com was worse at 7550 before and 2031 after.

This issue is very evident when using Slack, as the code makes individual requests to the Slack API for each subdomain; with a lot of subdomains this can take hours.

yassineaboukir commented 5 years ago

Hi Ross,

I was going to work on the deduplication since it would improve diff efficiency and accuracy as well as some people are using the output files with some other tools. Thanks a lot for helping me cross off this item from my list, very much appreciated.