Proposal: More generic server API

nmittler commented 1 year ago

I have a very specific use case that is not supported well by the current server APIs. I've recently built a DNS Proxy for the Istio project. The proxy will run on each node in a Kubernetes cluster and all outbound DNS traffic from the pods running on the node will be redirected to it. The job of this proxy is to serve DNS records for "known" workloads within the service mesh (e.g. my-service.my-ns.svc.cluster.local), and to forward everything else to the upstream resolver (e.g. kube-dns). DNS responses will indicate authoritative only for "known" services that are served directly by the proxy.

This doesn't fit into the current Authority model within the server, since the Catalog expects that each Authority is either authoritative for a particular domain or not. The set of domains for which the proxy is authoritative is potentially dynamic (e.g. a new cluster is added with a custom domain). For this reason, we want traffic for all domains to go through the proxy, and let it decide what to do. Architecturally, this boils down to is a chain of authorities rather than a Catalog.

For my initial implementation, I had to ditch the Catalog and Authority APIs entirely and write a raw RequestHandler from scratch. As a result, I ended up duplicating a significant portion of the Catalog logic, which seemed like a general failure of the API surface for this use case.

Describe the solution you'd like To better support the proxy use case described above, I suggest an API that is composable to allow chaining.

There are a couple of places where this problem could potentially be addressed:

RequestHandler: this is the top-level API. Unfortunately, you cannot chain RequestHandlers together because the trait methods use generics for the ResponseHandler argument. If we could make RequestHandler chainable, we could potentially turn the Catalog code into a set of common utilities to make building new RequestHandlers easier. To take this even further, we could refactor Catalog to be a composite RequestHandler that simply delegates to another RequestHandler based on the domain. This would effectively obviate the need for the Authority API entirely. IMO this is the cleanest solution architecturally.
Authority: Alternatively, we could refactor Catalog/Authority to allow for dynamically setting the authoritative flag on a per-request basis. This gets a little ugly because both Lookup and LookupError would have to carry that information. There may be other limitations to the Authority API which would require similar changes, but I guess we would find that out as we go.

Either of these would be somewhat destructive to our current API surface. To do this without breaking users, we might consider erecting a parallel set of APIs and deprecating the old ones (and eventually removing them entirely in a future release).

Anyway, those are my initial thoughts. I'm definitely open to other suggestions, given that I'm by no means an expert on the codebase.

Describe alternatives you've considered NA

Additional context NA

nmittler commented 1 year ago

@bluejekyll @djc any thoughts?

djc commented 1 year ago

Destroying the current API surface is not that big a deal IMO, as long as it becomes no less expressive.

Other than that I don't have strong feelings about your proposed ideas because I don't have a deep understanding of the proposed design. I do have the experience of building an authoritative name server which doesn't keep all of the authorities in memory (because that wouldn't scale), and so I've built a custom RequestHandler that so far doesn't use Catalog or Authority at all. Instead it talks to Redis to figure out if it's authoritative for a given domain and fetches the necessary records from Redis as well. I wrote that over a year ago so I don't remember exactly why I ended up going that way, but IIRC the whole Catalog and Authority model didn't really match the shape of the problem I was solving.

I don't fully understand why the current API doesn't allow chaining of RequestHandlers but I do think factoring Catalog and Authority into building blocks that can more easily be used to build RequestHandlers makes sense at a high level.

bluejekyll commented 1 year ago

Yeah, I'm ok with this suggestion in principle. I did plan to add an authority for chaning things together, like a caching authority on top of the resolver or recursor. I can't remember what happened with that, probably sitting in an old branch. That being said, a question I have about your use case, specifically the need to change if something is authoritative or not. Off the top of my head, the authoritative flag I think most impacts negative responses, in the sense that if there are no records returned or NXDOMAIN, then the resolver can "know" that that is an authoritative response and stop searching. This doesn't seem to be the reason you're suggesting needing to change the authoritative flag. Would you mind explaining that in more detail? Does k8s use the authoritative flag for other things beyond that?

That being said, I have no issue with us making the API simpler in this area. It's an "old" interface at this point, and I haven't revisited it in a while, so I'm open to changes that will make some of these ideas you have simpler.

nmittler commented 1 year ago

@djc if you ever get a few cycles, I'd be curious what you ended up doing. It might help to inform a new server API.

djc commented 1 year ago

This is substantially all of my authoritative DNS server:

pub struct DnsServer {
    pool: Pool<RedisMultiplexedConnectionManager>,
    resolver: TokioAsyncResolver,
}

impl DnsServer {
    pub async fn new(url: &str) -> anyhow::Result<Self> {
        Ok(Self {
            pool: Pool::builder()
                .min_idle(Some(1))
                .connection_timeout(Duration::from_secs(5))
                .build(RedisMultiplexedConnectionManager::new(url)?)
                .await
                .context("failed to connect to Redis")?,
            resolver: utils::resolver(),
        })
    }

    async fn lookup<R: ResponseHandler>(
        &self,
        request: &Request,
        mut response_handle: R,
    ) -> io::Result<ResponseInfo> {
        let info = request.request_info();
        let r#type = info.query.query_type();
        if r#type == RecordType::AXFR {
            warn!("refused to handle {} query", r#type);
            let response = MessageResponseBuilder::from_message_request(request)
                .error_msg(info.header, ResponseCode::Refused);

            return response_handle.send_response(response).await;
        }

        match self.query(Name::from(info.query.name()), r#type).await {
            Ok(answers) => {
                let mut header = Header::response_from_request(info.header);
                header.set_authoritative(true);
                let response = MessageResponseBuilder::from_message_request(request).build(
                    header,
                    answers.iter(),
                    [].iter(),
                    [].iter(),
                    [].iter(),
                );

                response_handle.send_response(response).await
            }
            Err(code) => {
                let response = MessageResponseBuilder::from_message_request(request)
                    .error_msg(info.header, code);
                response_handle.send_response(response).await
            }
        }
    }

    async fn query(&self, mut name: Name, r#type: RecordType) -> Result<Vec<Record>, ResponseCode> {
        let mut conn = match self.pool.get().await {
            Ok(conn) => conn,
            Err(err) => {
                warn!(%name, ?r#type, "failed to get redis connection: {err}");
                return Err(ResponseCode::ServFail);
            }
        };

        let mut authority = name.clone();
        loop {
            if authority.num_labels() < 2 {
                // For now, we're not using this server for a TLD.
                return Err(ResponseCode::Refused);
            }

            let soa = DnsKey::new(&authority, RecordType::SOA);

            match Cmd::exists(soa).query_async::<_, bool>(&mut *conn).await {
                Ok(true) => break,
                Ok(false) => {
                    authority = authority.base_name();
                    continue;
                }
                Err(err) => {
                    warn!(%name, ?r#type, "redis lookup error (SOA): {err}");
                    return Err(ResponseCode::ServFail);
                }
            }
        }

        let mut rsp = Vec::new();
        let mut done = false;
        loop {
            let key = DnsKey::new(&name, r#type);
            let cmd = Cmd::lrange(key, 0, isize::MAX);
            let future = cmd.query_async::<_, Vec<Vec<u8>>>(&mut *conn);
            let answers = match future.await {
                Ok(answers) => answers,
                Err(err) => {
                    warn!(%name, ?r#type, "redis lookup error: {err}");
                    return Err(ResponseCode::ServFail);
                }
            };

            for value in answers {
                if let Some(record) = DnsValue::to_record(&value, &name, r#type) {
                    rsp.push(record);
                    done = true;
                }
            }

            if done || r#type == RecordType::CNAME {
                return Ok(rsp);
            }

            // If no records were found, we check for CNAME records

            let key = DnsKey::new(&name, RecordType::CNAME);

            let cmd = Cmd::lrange(key, 0, isize::MAX);
            let record = match cmd.query_async::<_, Vec<Vec<u8>>>(&mut *conn).await {
                Ok(mut answers) => answers
                    .pop()
                    .and_then(|value| DnsValue::to_record(&value, &name, RecordType::CNAME)),
                Err(err) => {
                    warn!(%name, ?r#type, "redis lookup error (CNAME): {err}");
                    return Err(ResponseCode::ServFail);
                }
            };

            let record = match record {
                Some(record) => record,
                None => return Ok(rsp),
            };

            // If the CNAME record is in the same zone (as per the SOA found above),
            // we'll return it but also continue to resolve it.

            let target = match record.data() {
                Some(RData::CNAME(target)) => target,
                _ => unreachable!(),
            };

            if target.0 == name {
                // If the CNAME target is the same as the current name, return a server failure.
                error!(%name, ?r#type, "CNAME loop detected");
                return Err(ResponseCode::ServFail);
            } else if name == authority {
                // If the CNAME target is this zone's authority, we have to apply some magic:
                //
                // Technically the RFCs don't allow CNAME records at the root, but like Cloudflare,
                // we resolve any CNAME records at the root by resolving the target recursively.
                // https://blog.cloudflare.com/introducing-cname-flattening-rfc-compliant-cnames-at-a-domains-root/
                match self.resolver.lookup(target.0.clone(), r#type).await {
                    Ok(answers) => {
                        // Add the resolver's answers to our response and return it.
                        rsp.extend(answers.record_iter().cloned());
                        return Ok(rsp);
                    }
                    Err(err) => match err.kind() {
                        // The `rsp` might contain CNAME records we've gathered up to this point,
                        // but if the ultimate target doesn't resolve, we probably shouldn't send these.
                        ResolveErrorKind::NoRecordsFound { .. } => return Ok(Vec::new()),
                        _ => {
                            warn!(%name, ?r#type, "resolver error while flattening: {err}");
                            return Err(ResponseCode::ServFail);
                        }
                    },
                }
            } else if authority.zone_of(target) || target.0 == authority {
                // If the CNAME target is in the same zone, add it to our response and continue.
                name = target.0.clone();
                rsp.push(record);
            } else {
                // If the CNAME target is in another zone, add it to our response and return it.
                rsp.push(record);
                return Ok(rsp);
            }
        }
    }
}

#[async_trait]
impl RequestHandler for DnsServer {
    #[tracing::instrument(skip(self, response_handle), fields(
        otel.kind = "server", otel.name, dns.op_code, dns.question.r#type, dns.question.name,
        net.transport, dns.protocol, net.sock.peer.addr, net.protocol.name
    ))]
    async fn handle_request<R: ResponseHandler>(
        &self,
        request: &Request,
        mut response_handle: R,
    ) -> ResponseInfo {
        let op_code = request.op_code();

        let span = Span::current();
        span.record(
            OTEL_NAME.as_str(),
            format_args!("{} {}", request.message_type(), op_code),
        )
        .record("dns.op_code", display(op_code));
        span.record_kv(NETWORK_TRANSPORT.string(OtelProtocol(request.protocol())));
        span.record_kv(OTEL_DNS_PROTOCOL.string(request.protocol().to_string()));
        // TODO Use const after upgrading to opentelemetry-semantic-conventions 1.11
        span.record("net.sock.peer.addr", request.src().ip().to_string());
        span.record("net.protocol.name", "dns");

        let result = match request.message_type() {
            // TODO think about threading query lookups for multiple lookups, this could be a huge improvement
            //  especially for recursive lookups
            MessageType::Query if op_code == OpCode::Query => {
                span.record("dns.question.type", display(request.query().query_type()))
                    .record("dns.question.name", display(request.query().name()));
                let future = self.lookup(request, response_handle);
                match timeout(QUERY_TIMEOUT, future).await {
                    Ok(Ok(i)) => Ok(i),
                    Ok(Err(e)) => {
                        error!("error sending response: {}", e);
                        Err(ResponseCode::ServFail)
                    }
                    Err(_) => {
                        error!(?request, "request handler timed out");
                        Err(ResponseCode::ServFail)
                    }
                }
            }
            mtype => {
                let code = match mtype {
                    MessageType::Query => {
                        warn!("unimplemented op code: {:?}", op_code);
                        ResponseCode::NotImp
                    }
                    MessageType::Response => {
                        warn!("got a response as a request from id: {}", request.id());
                        ResponseCode::FormErr
                    }
                };

                let response = MessageResponseBuilder::from_message_request(request);
                response_handle
                    .send_response(response.error_msg(request.header(), code))
                    .await
                    .map_err(|e| {
                        error!("error sending response: {}", e);
                        ResponseCode::ServFail
                    })
            }
        };

        let err = match result {
            Ok(info) => return info,
            Err(e) => e,
        };

        error!("request failed: {}", err);
        let mut header = Header::new();
        header.set_response_code(ResponseCode::ServFail);
        header.into()
    }
}

Let me know if you have questions, hope that helps!

thedeadliestcatch commented 4 weeks ago

@djc Could you push the code to a repo (ready to build)? I'm trying to understand how to best fit a resolver (upstream), since my DNS server handler is only supposed to handle specific queries for testing. The redis parts and unresolved imports make it a tad confusing, even if the snippet already resolved quite a few questions for me (thank you for that!).

Any other upstream resolver examples would be fantastic, alas it seems hickory_server comes with none (unless you want to grok the complete server code).

hickory-dns / hickory-dns

Proposal: More generic server API #1983