Open laduke opened 1 year ago
So this does happen. The problem is that DNS is being flooded in the tests, and sometimes the machine cannot keep up, and UDP kind of does the rest of the damage. I struggled with this, but I'm also on a 24 thread machine, and I doubt your mac is, so it makes it hard to gauge a threshold that will work for everyone.
I don't really have a good answer for you other than "try again", they usually pass after a couple of tries, but again, on smaller core counts it might be more punishing.
Also: you can make test-integration
to run the tests as they are intended.
hey thanks for replying. I don't think I've ever had a full run of make test-integration
without at least one test failing from this machine.
I was thinking what you were saying re perf/load. But now I'm not 100% sure. I've edited the loops from 0..1000 to 0..1, for example, and they still fail (sometimes)
I'm logging the whole reverse_authority_map now at the end of configure_members
and sometimes the ptrs are there and sometimes they aren't. Sometimes they are there at first, but before the assertion runs, they aren't there.
for (_, authority) in self.reverse_authority_map.clone() {
warn!("REVERSE {:#?}", authority.authority.records().await);
}
and... I'm running make test-integration
again to get good example output and it keeps passing 😑
OK. Got one. Too much output:
Mar 06 15:07:44.353 ERROR zeronsd::authority: tick
Mar 06 15:07:44.611 ERROR zeronsd::authority: REVERSE {
RrKey {
name: LowerName(
Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
),
record_type: NS,
}: RecordSet {
name: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
record_type: NS,
dns_class: IN,
ttl: 30,
records: [
Record {
name_labels: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
rr_type: NS,
dns_class: IN,
ttl: 30,
rdata: Some(
NS(
Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
),
),
},
],
rrsigs: [],
serial: 1,
},
RrKey {
name: LowerName(
Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
),
record_type: SOA,
}: RecordSet {
name: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
record_type: SOA,
dns_class: IN,
ttl: 30,
records: [
Record {
name_labels: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
rr_type: SOA,
dns_class: IN,
ttl: 30,
rdata: Some(
SOA(
SOA {
mname: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
rname: Name("administrator.3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
serial: 1,
refresh: 30,
retry: 0,
expire: -1,
minimum: 0,
},
),
),
},
],
rrsigs: [],
serial: 1,
},
RrKey {
name: LowerName(
Name("6.4.6.1.5.c.0.5.7.e.3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
),
record_type: PTR,
}: RecordSet {
name: Name("6.4.6.1.5.c.0.5.7.e.3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
record_type: PTR,
dns_class: IN,
ttl: 60,
records: [
Record {
name_labels: Name("6.4.6.1.5.c.0.5.7.e.3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
rr_type: PTR,
dns_class: IN,
ttl: 60,
rdata: Some(
PTR(
Name("zt-e750c51646.home.arpa."),
),
),
},
],
rrsigs: [],
serial: 1,
},
}
Mar 06 15:07:44.611 ERROR zeronsd::authority: REVERSE {
RrKey {
name: LowerName(
Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
),
record_type: NS,
}: RecordSet {
name: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
record_type: NS,
dns_class: IN,
ttl: 30,
records: [
Record {
name_labels: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
rr_type: NS,
dns_class: IN,
ttl: 30,
rdata: Some(
NS(
Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
),
),
},
],
rrsigs: [],
serial: 1,
},
RrKey {
name: LowerName(
Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
),
record_type: SOA,
}: RecordSet {
name: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
record_type: SOA,
dns_class: IN,
ttl: 30,
records: [
Record {
name_labels: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
rr_type: SOA,
dns_class: IN,
ttl: 30,
rdata: Some(
SOA(
SOA {
mname: Name("3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
rname: Name("administrator.3.9.9.9.1.0.7.a.7.d.d.e.f.8.5.8.4.6.b.d.d.f.ip6.arpa."),
serial: 1,
refresh: 30,
retry: 0,
expire: -1,
minimum: 0,
},
),
),
},
],
rrsigs: [],
serial: 1,
},
}
Might be getting closer: it seems like there are 2 items in reverse_authority_map, one has the ptrs, one doesn't. If the no ptr item is second in the list, the lookup fails. If they are in the other order, it works.
I haven't looked that deep in a very long time, and I don't really have the time to right now. Maybe in a week or two.
Would love to hear what you come up with though, and I'm happy to review any patches you file.
Also I wanted to point out that the battery counts (the 0..1000) are there for a reason. UDP being a lossy protocol should fail naturally, but the service (and trust-dns's resolver) should retry failing connections. In practice (e.g., in the server you run yourself) this works fine, but in tests it gets overwhelmed, was my best theory.
That said, there could always be a real bug in there. Some of it was slapped together very quickly.
Looking forward to figuring it out. It's motivating me to explore the app.
Thanks for the note about the battery tests.
Cool. I'd love to hear about the solution when you find it, I'm curious now. :)
Haven't been able to look at it much for a few days. But here's what i know
In configure_members we iterate over the passed in reverse_authority_map
self.reverse_authority_map
.iter()
.for_each(|(network, authority)| {
error!("reverse_records.insert {} - {:?}", network, vec![authority.domain_name.clone()]);
reverse_records.insert(network, vec![authority.domain_name.clone()]);
});
which is a hashmap and gives you items in non deterministic order.
The order they pop out makes a difference:
doesn't pass
Mar 10 13:42:02.553 ERROR zeronsd::authority: reverse_records.insert fd23:3cca:ac27:ef83:2499:9300::/88 - [LowerName(Name("3.9.9.9.4.2.3.8.f.e.7.2.c.a.a.c.c.3.3.2.d.f.ip6.arpa."))]
mar 10 13:42:02.553 ERROR zeronsd::authority: reverse_records.insert fd23:3cca:ac27:ef83:2499:93e7:50c5:1646/88 - [LowerName(Name("3.9.9.9.4.2.3.8.f.e.7.2.c.a.a.c.c.3.3.2.d.f.ip6.arpa."))]
pass
Mar 10 14:02:27.632 ERROR zeronsd::authority: reverse_records.insert fdd3:ecf5:726d:95c5:899:93e7:50c5:1646/88 - [LowerName(Name("3.9.9.9.8.0.5.c.5.9.d.6.2.7.5.f.c.e.3.d.d.f.ip6.arpa."))]
Mar 10 14:02:27.632 ERROR zeronsd::authority: reverse_records.insert fdd3:ecf5:726d:95c5:899:9300::/88 - [LowerName(Name("3.9.9.9.8.0.5.c.5.9.d.6.2.7.5.f.c.e.3.d.d.f.ip6.arpa."))]
full address vs subnet order very strange to me at the moment
the ptr records always end up on the fd23:3cca:ac27:ef83:2499:9300::/88
"short/subnet" network, and never on the fdd3:ecf5:726d:95c5:899:93e7:50c5:1646/88
"long/ full address" network, as far as i've seen
I've been trying to figure this out. Thought I'd post note in public instead of not. I'm on a mac. Some small random subset of the tests will fail if I run all of them. I'm focusing now on the rfc4193::test_battery_single_domain_named. The reverse ptf will fail, sometimes. I think I've seen other parts fail, but if it does fail, 99% of the time is the reverse ptr. I'm on a mac. (There's a mac bug that is fixed in
dev
that makes the tests not work at all)If I sprinkle some longer delays around, they get more reliable, but not 100%
How I've been starting the test, for example:
RUST_BACKTRACE=1 RUST_LOG=debug TOKEN=$(cat test-token.txt) sudo -E cargo test -- rfc4193::test_battery_single_domain_named --nocapture --test-threads 1 | grep "zeronsd\|trust\|integration"
Here are some tracing output. I can't tell why it works in some cases and not in others from the traces.