mcmilk / 7-Zip-zstd

7-Zip with support for Brotli, Fast-LZMA2, Lizard, LZ4, LZ5 and Zstandard
https://mcmilk.de/projects/7-Zip-zstd/
Other
5.06k stars 300 forks source link

BF: vulnerability by command line parsing (considers valid windows escape sequences) #307

Closed sebres closed 1 year ago

sebres commented 1 year ago

This PR provides a fix for certain vulnerability by command line parsing (considers valid windows escape sequences), like backslashes followed by quote-char as well as quote triples).

Windows command line processor (as well as programs and standard libraries constructing command line from arguments) use special escape and quoting sequences, so...

the proper scenario to parse command line looks like that... - arguments are separated by spaces or tabs - quotes serve as optional argument delimiters `"a b"` --> `a b` - escaped quotes must be converted back to `"` `\"` --> `"` - consecutive backslashes preceding a quote (inclusive closing quote for a parameter with spaces) see their number halved with the remainder escaping the quote: 2n backslashes + quote --> n backslashes + quote as an argument delimiter 2n+1 backslashes + quote --> n backslashes + literal quote - backslashes that are not followed by a quote are copied literally: `a\b` --> `a\b` `a\\b` --> `a\\b` - in quoted strings, consecutive quotes see their number divided by three with the remainder modulo 3 deciding whether to close the string or not. Note that the opening quote must be counted in the consecutive quotes, that's the (1+) below: (1+) 3n quotes -> n quotes (1+) 3n+1 quotes -> n quotes plus closes the quoted string (1+) 3n+2 quotes -> n+1 quotes plus closes the quoted string - in unquoted strings, the first quote opens the quoted string and the remaining consecutive quotes follow the above rule.
this causes following behavior by parsing of command line... cmd line|arg1|arg2 ---|---|--- abc" "def|abc def| abc\\" \\"def|abc"|"def "abc\\" \\"def"|abc" "def| "abc"" ""def"|abc" "def| abc"" ""def|abc|def abc""" """def|abc" "def| abc\\""" \\"""def|abc"|"def abc\\\\""" \\\\"""def|abc\\" \\"def "abc"""def" ghi"|abc"def ghi| "abc"""def" ghi
* missing close qoute at end|abc"def ghi| "abc""def" ghi|abc"def|ghi "abc\\"""def" ghi|abc""def|ghi "abc\\\\"""def" ghi"|abc\\"def ghi| "abc\\\\\\"""def" ghi|abc\\""def|ghi "abc\\\\\\\\"""def" ghi"|abc\\\\"def ghi|

I know Windows doesn't allow quote-char in file or folder names, but command line could be built by some automation with a foreign user input and if it becomes validated using nominal condition of Windows parser, but hereafter processed in 7zip, as a consequence it could cause:

Alternatively one could use windows API CommandLineToArgvW, but this would imply a modifying of SplitCommandLine interface as well as new dependency to shell32.dll.

With this fix SplitCommandLine works now very similar to CommandLineToArgvW (but doesn't require dependency to shell32 library).

mcmilk commented 1 year ago

Thank you, pull request rebased and integrated.

sebres commented 1 year ago

Thx for review and merge. May I ask about the reason of rebase? (was something wrong in the commit comments?)

mcmilk commented 1 year ago

The comments were okay, but the line lengths of them are 65 ... I also add signed-off ... I should maybe put some skeleton things into .github

sebres commented 1 year ago

OK, understand.

By the way, by fixing of that here I thought have been finding similar vulni on ReactOS and wine, just...

It looks like in case of ReactOS and wine (for the function CommandLineToArgvW), one can say there is nothing and it works as expected, but in case of 7-zip, I'm unsure a bit. Script langs like python, tcl and co follow the same mechanisms like it is currently implemented here, but other programs using CommandLineToArgvW or main would do that differently.

Here is small table illustrating the issue...
command line
(arguments after exe)
CommandLineToArgvW / mainpython, tcl, etc
CA1A2A3CA1A2A3
three"""quotes next2three"quotesnext 1three"quotes next
four"""" quotes" next 4%3=13four" quotesnext4%3=1 2four"quotes next 4%3=1
five"""""quotes next2five"quotesnext 1five""quotes next
seven""""""" quotes" next 7%3=13seven"" quotesnext7%3=1 3seven""" quotesnext7%3=1
twelve""""""""""""quotes next2twelve""""quotesnext 2twelve"""""quotesnext
thirteen""""""""""""" quotes" next 13%3=13thirteen"""" quotesnext13%3=1 3thirteen"""""" quotesnext13%3=1
"two""quotes next2two"quotesnext 1two"quotes next
"two"" next2two"next 1two" next
"three""" quotes" next 4%3=13three" quotesnext4%3=1 2three"quotes next 4%3=1
"four""""quotes next2four"quotesnext 1four""quotes next
"six"""""" quotes" next 7%3=13six"" quotesnext7%3=1 3six""" quotesnext7%3=1
"eleven"""""""""""quotes next2eleven""""quotesnext 2eleven"""""quotesnext |
"twelve"""""""""""" quotes" next 13%3=13twelve"""" quotesnext13%3=1 3twelve"""""" quotesnext13%3=1
"the crazy \\"""\\" quotes2the crazy \"\quotes 1the crazy \"\ quotes

The way how the parsing is implemented in this PR and merged showing the right part of table (like script langs doing it). If it should be rather like a left part of table (like native windows API / std libs doing it) one'd need to apply this patch:

 CPP/Common/CommandLineParser.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CPP/Common/CommandLineParser.cpp b/CPP/Common/CommandLineParser.cpp
index d07f0f1..6bea75b 100644
--- a/CPP/Common/CommandLineParser.cpp
+++ b/CPP/Common/CommandLineParser.cpp
@@ -51,7 +51,7 @@ static const wchar_t * _SplitCommandLine(const wchar_t* s, UString &dest)
           if (++qcount == 3)
           {
             dest += L'"';
-            qcount = 1;
+            qcount = 0;
           }
         }
         f = s;

Just I'm still unsure which strategy is better/safer, probably must investigate in source code history of python/tcl (at least this two) or interview those community (or code originators).

mcmilk commented 1 year ago

@sebres - should I release a new version with that fix? What do you think about this?

sebres commented 1 year ago

I'm still unsure about the further amend (with qcount = 0): Although it is very improbable that someone would use triple-quotes as escape sequence for whatever invocation (it is rather \" sequence, which would work correct now), it could be used anyway (for example to avoid insertion of unclosed quote-char by unpaired quotes in the argument). And this remains a certain inconsistence between default windows notation (if used for validation) and current implementation... Anyway it's important to have proper solution from scratch (first release implementing new parsing rules). I had no time yet to inspect all the langs (source code history, tests, etc), why those parsing is implemented differently to windows, but at the moment it looks rather like this were historic inconsistence, not an intention. But I'd like to fulfill my research. Can you wait few days with the release?

sebres commented 1 year ago

OK, regarding different behavior of script-langs, it looks indeed like a historic mistake. In tcl it was already fixed in newest versions. For python I'd provide a PR later (as well as check whether some other langs are also affected in trunk/mainline). Thus the correct decision would be to amend the fix as described above (provided in #310).