Closed Slange-Mhath closed 2 years ago
When I read this, I thought ... of course this can't be possible, I must be using golang's json package to write safe json output, and surely that library is thoroughly battle-tested. Well... I just checked the code and, embarrassingly, the json output is hand-crafted & indeed this is very definitely possible! There is some "sanitisation" of file names in the output (https://github.com/richardlehane/siegfried/blob/master/pkg/writer/writer.go#L227) but evidently not enough. I'll aim to fix this for the next release
Ha! Classic (and not embarrassing at all!) Thank you so much for this. Really appreciate it!
I can't recreate this with the folder structure below and command sf -json test_data | jsonlint
or sf -json test_data | python -m json.tool
:
ross-spencer:/tmp$ ls -la test_folder
total 92
drwxrwxr-x 2 ross-spencer ross-spencer 4096 Jun 30 17:09 .
drwxrwxrwt 26 root root 73728 Jun 30 17:10 ..
-rw-rw-r-- 1 ross-spencer ross-spencer 0 Jun 30 17:02 '\afile'
-rw-rw-r-- 1 ross-spencer ross-spencer 0 Jun 30 17:05 '\atest'
-rw-rw-r-- 1 ross-spencer ross-spencer 5 Jun 30 13:17 '?file'
-rw-rw-r-- 1 ross-spencer ross-spencer 5 Jun 30 13:17 '^file'
-rw-rw-r-- 1 ross-spencer ross-spencer 0 Jun 30 17:09 '^M'
-rw-rw-r-- 1 ross-spencer ross-spencer 0 Jun 30 13:18 @test
-rw-rw-r-- 1 ross-spencer ross-spencer 5 Jun 30 13:17 '^?test_file'
-rw-rw-r-- 1 ross-spencer ross-spencer 0 Jun 30 17:07 '^Test_file'
-rw-rw-r-- 1 ross-spencer ross-spencer 0 Jun 30 17:08 '?Test.log'
Siegfried then outputs the filenames:
"filename": "test_folder/?Test.log",
"filename": "test_folder/?file",
"filename": "test_folder/@test",
"filename": "test_folder/\\afile",
"filename": "test_folder/\\atest",
"filename": "test_folder/^?test_file",
"filename": "test_folder/^M",
"filename": "test_folder/^Test_file",
"filename": "test_folder/^file",
That aside though I wanted to point y'all at the test here I wrote recently which might be useful to copy of modify to simulate the error during the fix/refactor:
I suspect how control characters are shown on the filesystem differs with operating system.
On RedHat 7, when creating test files with control characters using Control-v then Control-a (Control-m, Control-o) they appear like this:
-rw-r--r-- 1 root root 0 Jun 30 18:06 ?testing_control_character_a -rw-r--r-- 1 root root 0 Jun 30 18:09 ?testing_control_character_m -rw-r--r-- 1 root root 0 Jun 30 18:05 ?testing_control_character_o
Attached json output from siegfried which shows the included control characters.
However on Ubuntu, a control character named file looked like this: -rw-rw-r-- 1 root root 0 Jun 30 18:08 ''$'\017''testing'
siegfried_json_output_ubuntu.txt Attached ubuntu json output, also includes control character.
for reference, here is how the standard library escapes json strings: https://golang.org/src/encoding/json/encode.go?s=8400:8459#L1028 Key thing missing is escaping of bytes < 0x20
With some time today I used those tests mentioned above to (I believe) create a test that confirms the issue generically (i.e. not contingent on OS differences) that can be used to develop from here: https://github.com/richardlehane/siegfried/pull/167 (it tests and passes for true-negatives and passes, and true-positives and passes, so it will fail when the issue is fixed - I figured a "pass" would generate less noise although I appreciate it might be less intuitive).
Hey @richardlehane,
Is it possible that control characters in the JSON output are not escaped correctly? Running Siegfried with a file that has for instance the name
"/tmp/test_folder/^Test_file 1"
(linux output might look like"/tmp/test_folder/?Test.log"
) seems to output an invalid JSON.A way to replicate that error would be to create a new text file with a control character inside and run a scan with sf -json. The JSON output will contain the control character, which might result in an invalid JSON format.
Not sure if this might affect other output formats (like YAML) as well.