rthalley / dnspython

a powerful DNS toolkit for python
http://www.dnspython.org
Other
2.42k stars 509 forks source link

to_text() on txt record adds whitespace #1121

Closed Alge closed 3 weeks ago

Alge commented 3 weeks ago

Describe the bug When fetching a TXT record consisting of multiple character strings, the to_text() method on the answer object returns the strings with a " " between them instead of concatenating them together directly. This makes the to_text() method unusable in these cases and you have to manually add the rdata.strings values together instead.

From RFC4871: "Strings in a TXT RR MUST be concatenated together before use with no intervening whitespace."

RFC 1035 section 3.3.14 RFC 4871 section 3.6.2.2

To Reproduce

_test.alge.se contains a TXT record with: "abcdefgh" "ijklmno", which should be parsed as "abcdefghijklmno". Instead it is parsed as "abcdefgh ijklmno".

import dns.resolver

answers = dns.resolver.resolve('_test.alge.se', 'TXT')
for rdata in answers:
    assert rdata.to_text() == "abcdefghijklmno"

Context (please complete the following information):

rthalley commented 3 weeks ago

This is not a bug. DNS text records are lists of strings. If you do not quote and specify multiple strings by separating the text with whitespace, then you get a list with that many strings. If you want one string, then you have to have no spaces or quote. RFC 4871 is reminding you to concatenate multiple strings together as the DNS does not do this for you; this is also needed if you want strings longer than 255, which is that max that a single RFC 1035 TXT record character-string can hold.

rthalley commented 3 weeks ago

You can also do this to do the concatenation easily:

>>> r = dns.rdata.from_text("IN", "TXT", "one two three")
>>> b''.join(r.strings)
b'onetwothree'
Alge commented 3 weeks ago

What is the use of the to_text function if it doesn't properly concatenate the strings?

A real world example where this causes problems is with a spf record. They can be quite long, and adding a space will cause errors while parsing it as the different mechanisms are separated by whitespace.

Imagine this situation:

A spf record is broken into multiple parts due to being too long. The parts end up like this:

"[... long record ...] ip4:127.0" ".0.1"

If to_text returns a space in the middle of the last IP it will parse as 2 invalid mechanisms. This is why the RFC specifies that the parts needs to be concatinated without any whitespace between them.

rthalley commented 3 weeks ago

"to_text" is design to emit output in the form required by the DNS zonefile (RFC 1035). It MUST NOT concatenate strings as that would alter the meaning of the DNS wire format data from (e.g.) "array of 3 strings 'one', 'two', and 'three'" to "one string 'onetwothree'". Always quoting ensures that we do not have trouble if the data needs quoting, and creates a handy canonical form.

The RFC is telling the application to do the concatenation, for the reasons you said, not the DNS.

rthalley commented 3 weeks ago

(And I grant that the DNS would have been simpler if the TXT record were just a single string, because that's what people use it for, but that isn't what the standard says. So if an application wants to have a long string, it must break it up into parts less than 256 bytes and then put it back together itself, because the wire protocol is about sending a non-empty array of strings each of which can be between 0 and 255 bytes long.