Closed janik-cloudflare closed 6 months ago
I know where you are coming from and feel your pain. I tried doing something similar for CoreDNS's caching back in the day. Before I read your proposal in much more detail. Note that:
NewRR
is expensive in itself - you might also just want to create an RR without parsing some stringified rdata, i.e. without NewRR
- I never found a good Go api for that - although now maybe with Generics there might be a clean way?ok, read it.
I see. Let me be clear that the RFC 1035 zone format is a horrendous format, even if you remove most of it, like $TTL
and $ORIGIN
and $INCLUDE
and $GENERATE, you also still have (
and ')' which could potentially show up and newlines should be ignored there - or do you also want to ban those from parsing?
If we go back to the original task, it's: if have some bytes and I want those bytes to be the rdata of this dns.RR
i have, so the question then becomes:
dns.RR
and this parsed rdata (here you prolly do need some support in this lib)FWIW as eluded in my first comment, I never found a elegant way of doing the later in Go, do we need a dns.RData
interface thingy, this was an attempt for work directory with rdata fields: https://pkg.go.dev/github.com/miekg/dns#Field (obviously that only reading them).
So dunno - open to ideas, but I think a customized parser that works on the text representation isn't the best thing, nor the speediest
Thank you @miekg! You raise some very good points.
Initially I would've said (
)
should be allowed, but the more I think about it, the more they seem like zone file syntax rather than content syntax, if that makes sense. Really, maybe all we need is the ability to parse quotes and backslash escapes (and skip whitespace). That still leaves us with a lot of type-specific behavior, though: net.ParseIP
, toAbsoluteName
, the occasional Base64 decoding, and the nightmare that is LOC
parsing.
It seems the only lexer function called by each type in scan_rr.go
is Next()
, so I wonder if we could somehow turn that into an interface and find an elegant way to inject a different "lexer" behind the scenes (whether an actual lexer or something simpler that just produces the right data at the right time). I'll definitely need to think about that some more though.
Thanks for all the feedback so far!
Coming back to this... still on the dunno. Had kept it in the back of my mind, and my question is. Why is this data in txt format in the first place? It makes more sense to have this in (uncompressed!) wire format, parse that back in and put the RR metadata on it, and voila -> RR
In our case it's coming from the API or a database column that needs to be searchable text. In most other cases, I would agree that storing wire format data is better.
Maybe ideally we'd have something similar to #1507 that allows parsing to work without a hard dependency on the lexer, but that is obviously a larger change. I'm still hoping to get a PR up at some point for you to review but if you don't object, I'm going to close this issue in the meantime since it's not really a bug and more of a "request for your thoughts".
Thank you for your input, Miek!
[ Quoting @.***> in "Re: [miekg/dns] RFC: Parsing record..." ]
In our case it's coming from the API or a database column that needs to be searchable text. In most other cases, I would agree that storing wire format data is better.
Maybe ideally we'd have something similar to #1507 that allows parsing to work without a hard dependency on the lexer, but that is obviously a larger change. I'm still hoping to get a PR up at some point for you to review but if you don't object, I'm going to close this issue in the meantime since it's not really a bug and more of a "request for your thoughts".
Thank you for your input, Miek!
Cheers!
The only take-away I currently have is that the next version of miekg/dns should have something for this.
What we can do (and want to do) with the current code, I still don't know.
/Miek
We often need to construct an RR from its name, type, TTL and content (for example, when reading records from the database or from user API requests.)
We currently use something similar to
NewRR(fmt.Sprintf("%s %d IN %s %s", name, ttl, type, content))
, but that requires a fair amount hacky extra work:1.2.3.4\n5.6.7.8
to parse without errors (as1.2.3.4
).$INCLUDE
somehow, so we need to disable that as a precaution.fe80::1\t
is considered a valid AAAA record content.It's also currently impossible to parse generic (RFC 3597) records into a
*dns.RFC3597
when the underlying type is known to this library becauseNewRR
always returns the known type's struct in that case, even when the content started with a\#
token.ToRFC3597
can be used to convert it back to a*dns.RFC3597
, but that callspack()
, which changes behavior depending on the underlying type. (For example, callingNewRR
with. 1 IN TYPE9999 \# 3 abcdef
produces Rdata\# 3 abcdef
as expected, whereas changing TYPE9999 to TYPE48 results in\# 4 abcdef00
instead.)This is far from ideal for our use case, so I've taken a stab at trying to build something more reasonable, which would require additional public interface. In short, I've been trying to build something that parses only the record's content, without any special zone file syntax.
My first thought was that this would be fairly simple to do with a single function (this one isn't super clean; consider it an example or first draft):
This gets us most of the way there, but I realized the lexer is also context-aware and also parses comments. The former is easily worked around by initializing the
zlexer
withowner: false
, which makes it expect content instead of an owner directive. The latter is more problematic and seems to require an entirely new, purpose-built lexer. SinceRR.parse()
expects a*zlexer
, that would either involve makingzlexer
an interface, or adding the new logic to the existingzlexer
struct. In addition, I noticed allRR.parse()
functions callslurpRemainder()
, which would then possibly have to be moved out.I'd be happy to work on this but before I spend too much time I wanted to ask whether this is a change you'd consider merging at all (hopefully), and, if so, whether someone with a deeper understanding of the parser and lexer has any design thoughts for me. I know you all tend to be conservative when it comes to adding new public interface (which is good), but I think this could be a useful addition that would enable use cases that are otherwise very difficult to achieve correctly.