Add a Integration Test to scan the top 1k domains and validate that ZDNS's result is correct

phillip-stephens commented 2 months ago

We have an extensive suite of integration tests that validate that ZDNS can correctly pull most DNS record types for zdns-testing.com. This validates ZDNS's ability to parse dns records but doesn't really validate ZDNS being able to scan many domains successfully. ZDNS's use of multiple worker threads each requesting a domain has logic that should be tested with a larger-scale scan.

The challenge with doing any sort of validation that a given IP A actually hosts a domain X is that many domains are hosted with hosting providers that provide anti-DDoS protection. This makes any sort of automated requesting difficult.

However, a solution is to validate ZDNS only against domains which do not have such anti-DDoS measures.

Therefore, this test has 2 phases:

get which of the top-1k domains can be successfully reached with a GET request. -> domains that are request-able in an automated fashion
From these request-able domains, we run ZDNS on it and then attempt to send a GET request to the returned IP. These should all be successful.

Changes

Changed Github CI to only run on pull requests and branches named main. This prevents the previous behavior where an action was started for every branch change and every PR change, effectively meaning every time a PR had a push 2 CI actions would be kicked off for each test, with 1 of each being redundant.
Added a integration test to scan the top domains.

zakird commented 2 months ago

I'm not opposed to this PR, but it does feel like a very roundabout way of testing this too by relying on similarity in HTTP(S) requests, which I suspect could be fragile. Is there any reason that this is necessary, versus, for example, looking up domains and seeing the success rate that we see from the scan?

phillip-stephens commented 2 months ago

In my mind, that rests on the assumption that the A record returned by ZDNS is accurate and ZDNS is able to correctly ascertain whether it was able to get the correct record for a given domain.

I could imagine all sorts of bugs that could mess with that and cause ZDNS to not error but also return inaccurate records for a given domain. (Ofc if the name server returns an invalid IP, that's not a ZDNS bug. I'm imagining a ZDNS bug where either through bugs in the cache or recursive logic it returns a record that is inaccurate with reference to the name server for said domain).

This test validates that a given A record for a domain has an IP that truly hosts said domain without relying on ZDNS for any proxy of success and IMO that's what we really want from an integration test. Presumably, the top 100 domains are going to have more strict quality control on their DNS infrastructure and so I'd imagine we'd rarely run into bugs where the issue is on the DNS name server itself.

As for the fragility of the test, yeah I agree it's a bit fragile since it doesn't differentiate between a bad result returned by a name server or a bug in ZDNS. Also, that requests library can simply error. I think there's immense value in having a sort of real scan sanity check for accuracy like this, but I'm very open to suggestions on how to improve its fragility.

zakird commented 2 months ago

Can we separate this test out in Github so that we can monitor success/failure separately?

phillip-stephens commented 2 months ago

@zakird Looking at the Checks on this PR, the test is run separately (called build-and-test-large-scale)

zmap / zdns

Add a Integration Test to scan the top 1k domains and validate that ZDNS's result is correct #370

Changes