paulc / dnslib

A Python library to encode/decode DNS wire-format packets
https://github.com/paulc/dnslib
BSD 2-Clause "Simplified" License
295 stars 84 forks source link

TXT records with Unicode control chars in them #32

Closed sbv-csis closed 1 year ago

sbv-csis commented 2 years ago

I'm not sure if it's a problem or not, but I've been recently comparing some dig output to dnslib output for TXT records and I'm not sure what to think about TXT records with bytes in them that translate to unicode chars - for example:

$ dig -t TXT @8.8.8.8 smartjailmail.com
....
smartjailmail.com.  3600    IN  TXT "google-site-verification=7Avm2jKuluvrgko_FgTUqYqlYpvYu6hMf\005\000\000\000\000\000\000\000DQ"
....

And via dnslib.client

$ python -m dnslib.client --server 8.8.8.8:53 smartjailmail.com TXT
...
smartjailmail.com.      3600    IN      TXT     "google-site-verification=7Avm2jKuluvrgko_FgTUqYqlYpvYu6hMfDQ"
...

and via repr on the RD.data property:

b'google-site-verification=7Avm2jKuluvrgko_FgTUqYqlYpvYu6hMf\x05\x00\x00\x00\x00\x00\x00\x00DQ'

As I read the code dnslib took the bytes and tried to parse it as utf-8 and discards any non-utf8 chars - and dig escapes anything outside of ascii perhaps?

When I read the RFC \DDD indeed is allowed:

\DDD where each D is a digit is the octet corresponding to the decimal number described by DDD. The resulting octet is assumed to be text and is not checked for special meaning.

but I'm not quite sure what to expect of it with regards to dnslib - As I read the RFC encoding of TXT records is not prescribed :shrug: I would worry that a TXT record with some kind of esoteric encoding would break the dnslib way of turning the TXT records into text again

paulc commented 1 year ago

Thanks for spotting this. dnslib does actually parse the data correctly into bytes (as the repr shows) however the problem is when the data is printed. The data is converted into unicode (including the \x00) characters however when the string is printed these are not visible. The RFC is a bit vague however in theory we should only accept ASCII characters in the text representation of TXT records and escape everything else however I think people would now expect to be able to use UTF-8 so we have to be careful about encoding non-printable characters. I have made some changes in the latest version (0.9.20) which should fix this and appears to work with your example however I suspect that the behaviour will be different from DIG in some cases. In most cases it is safer to deal with the raw bytes data in the TXT record if this is important. Let me know if this fixes your problem.

% python -m dnslib.client --server 8.8.8.8:53 smartjailmail.com TXT
...
smartjailmail.com.      3600    IN      TXT     "google-site-verification=7Avm2jKuluvrgko_FgTUqYqlYpvYu6hMf\005\000\000\000\000\000\000\000DQ"
...