pi-hole / FTL

The Pi-hole FTL engine
https://pi-hole.net
Other
1.36k stars 194 forks source link

Report the hex-code of the found invalid character #1917

Closed DL6ER closed 6 months ago

DL6ER commented 6 months ago

What does this implement/fix?

Make the Pi-hole diagnosis message a bit clearer.

Before:

Host name of client "192.168.1.2" => "äbc" contains (at least) one invalid character at position 0

Now:

Host name of client "192.168.1.2" => "äbc" contains (at least) one invalid character (hex e4) at position 0

Knowing the hex code may be useful in trying to debug this: Screenshot from 2024-03-23 19-41-25 https://www.rapidtables.com/code/text/ascii-table.html

This may not be the very best example but think of the invalid character being something unprintable like <EOF>.

We also add some safety measure so malicious modifications to the Pi-hole diagnosis entries in the database cannot trigger a buffer overflow.


Related issue or feature (if applicable): See discourse thread linked below.

Pull request in docs with documentation (if applicable): N/A


By submitting this pull request, I confirm the following:

  1. I have read and understood the contributors guide, as well as this entire template. I understand which branch to base my commits and Pull Requests against.
  2. I have commented my proposed changes within the code.
  3. I am willing to help maintain this change if there are issues with it later.
  4. It is compatible with the EUPL 1.2 license
  5. I have squashed any insignificant commits. (git rebase)

Checklist:

pralor-bot commented 6 months ago

This pull request has been mentioned on Pi-hole Userspace. There might be relevant details there:

https://discourse.pi-hole.net/t/host-name-of-client-xxx-contains-at-least-one-invalid-character-at-position-0/69132/11

DL6ER commented 6 months ago

079c66c adds control character escaping to the output. To reduce code-duplication, we simply reuse the already existing JSON string encoder acknowledging that the log file is UTF-8 so we only really need to escape control characters:

Another example containing a control character:

WARNING: Host name of client "127.0.0.1" => "xyz\nabc" contains (at least) one invalid character (hex 0a) at position 3

if a client advertises host name containing a newline as in this new example: Screenshot from 2024-03-24 09-47-04

The "one-character sequences" will be escaped in the following, well-known, way:

all others will be escaped following the JSON syntax as, e.g. \u001B for "escape" also known but not standardized as "\^[".