Closed elboulangero closed 2 months ago
So the changes look fine in general.
Compatibility, also with Python 2, should not be an issue since the syntax stayed the same but Python 3.12+ is stricter and emits a warning when it comes to backslash sequences. That means if a string contains a backslash it must be a valid escape sequence like \n
or, if a literal backslash is intended, then it must either be escaped with another backslash \\
or the string should be a r"raw\ string\ literal"
. But this works the same with Python 2.
There are only two things I would like to double-check.
adxcsouf2john.py
the byte-string literal has been changed to a raw-string literal. But now string and bytes are probably incompatible and shouldn't match in the regex. I would expect that you need a raw-byte-string literal: rb'...'
ejabberd2john.py
it uses the Grammer
class of parsimonious
. I don't know this package and can't tell whether it still works with the preceding newline this change now introduced.Unfortunately, we have no test files for any of the formats in john-samples. But since @elboulangero found these warnings, you probably have some test files? Can you please retest especially the two mentioned scripts or at least justify that you are sure it works this way?
Thank you very much @exploide for such a thorough review!
@elboulangero This reminds me, if you have any sample/test inputs you could publish, please submit those to our john-samples repo via a pull request. Thank you!
Thanks for the quick feedback.
First, I can share the warnings that I get with Python 3.12, and that are fixed with this PR:
/usr/share/john/adxcsouf2john.py:98: SyntaxWarning: invalid escape sequence '\ '
matches = re.findall(b'([a-zA-Z0-9_-]{3,9})\ (\d{8})', data)
/usr/share/john/aix2john.py:36: SyntaxWarning: invalid escape sequence '\s'
if re.match('^\s*\S+\s*:\s*$',line):
/usr/share/john/dashlane2john.py:75: SyntaxWarning: invalid escape sequence '\D'
sys.stderr.write("\nThe required .aes files can be found inside %AppData%\Dashlane\profiles directory tree on Windows.\n")
/usr/share/john/dashlane2john.py:76: SyntaxWarning: invalid escape sequence '\ '
sys.stderr.write("\nThe required .aes files can be found inside ~/Library/Application\ Support/Dashlane/profiles/ directory tree on macOS.\n")
/usr/share/john/dns/rdata.py:277: SyntaxWarning: invalid escape sequence '\#'
if not token.is_identifier() or token.value != '\#':
/usr/share/john/ejabberd2john.py:80: SyntaxWarning: invalid escape sequence '\s'
grammar = Grammar("""\
/usr/share/john/vmx2john.py:26: SyntaxWarning: invalid escape sequence '\)'
ks_re = '.+phrase/(.*?)/pass2key=(.*?):cipher=(.*?):rounds=(.*?):salt=(.*?),(.*?),(.*?)\)'
/usr/share/legion/app/auxiliary.py:162: SyntaxWarning: invalid escape sequence '\['
string = '\[[0-9]+\]\[[a-z-]+\].+' # when a password is found, the line contains [port#][plugin-name]
In
ejabberd2john.py
it uses the Grammer class of parsimonious. I don't know this package and can't tell whether it still works with the preceding newline this change now introduced.
For this one, I think that just printing the two strings (before and after turning it in a raw string), is enough to show that there was no change:
$ python3.12
Python 3.12.6 (main, Sep 7 2024, 14:20:15) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("""\
... entry = (term _ "." _)* _
... term = boolean / atom / list / tuple / map / string / binary / number
... atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
... _ = ~"\s*"
... list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
... tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
... map = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
... keyvalue = term _ "=>" _ term _
... string = '"' ~r'(\\\\.|[^"])*' '"'
... binary = "<<" string ">>"
... boolean = "true" / "false"
... number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"
... """)
<stdin>:1: SyntaxWarning: invalid escape sequence '\s'
entry = (term _ "." _)* _
term = boolean / atom / list / tuple / map / string / binary / number
atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
_ = ~"\s*"
list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
map = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
keyvalue = term _ "=>" _ term _
string = '"' ~r'(\\.|[^"])*' '"'
binary = "<<" string ">>"
boolean = "true" / "false"
number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"
>>> print(r"""
... entry = (term _ "." _)* _
... term = boolean / atom / list / tuple / map / string / binary / number
... atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
... _ = ~"\s*"
... list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
... tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
... map = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
... keyvalue = term _ "=>" _ term _
... string = '"' ~r'(\\.|[^"])*' '"'
... binary = "<<" string ">>"
... boolean = "true" / "false"
... number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"
... """)
entry = (term _ "." _)* _
term = boolean / atom / list / tuple / map / string / binary / number
atom = ~"[a-z][0-9a-zA-Z_]*" / ("'" ~"[^']*" "'")
_ = ~"\s*"
list = ( _ "[" _ term (_ "," _ term)* _ "]" ) / ( _ "[" _ "]")
tuple = ( _ "{" _ term (_ "," _ term)* _ "}" ) / ( _ "{" _ "}")
map = ( _ "#{" _ keyvalue (_ "," _ keyvalue)* _ "}" ) / ( _ "#{" _ "}")
keyvalue = term _ "=>" _ term _
string = '"' ~r'(\\.|[^"])*' '"'
binary = "<<" string ">>"
boolean = "true" / "false"
number = ~"\-?[0-9]+\#[0-9a-zA-Z]+" / ~"\-?[0-9]+(\.[0-9]+)?((e|E)(\-|\+)?[0-9]+)?"
In fact, I saved the two outputs in two files and then compared them, otherwise it's not possible to really say if there's a difference.
In
adxcsouf2john.py
the byte-string literal has been changed to a raw-string literal. But now string and bytes are probably incompatible and shouldn't match in the regex. I would expect that you need a raw-byte-string literal: rb'...'
You are surely right on that, I updated the PR to use rb'...'
. Thanks for the feedback!
you probably have some test files? Can you please retest especially the two mentioned scripts or at least justify that you are sure it works this way?
No, I don't have tests unfortunately.
I'm part of Kali Linux team, and Python 3.12 is the new default in Kali, it's about to hit Kali Rolling later today if all goes well. We see those warnings when the john
package is installed, so I fix it, mainly for cosmetics purpose.
But no, I don't have specific tests, sorry. It's Ok if you don't want to merge this PR, if you think the changes can't be included without more extended testing.
Also, we do run all the tests as part of our CI pipelines, and at the moment the tests run only against Python 3.12. We don't test Python 2.
I think it's fine and ready to merge as soon as you push the rb'...'
change. (You wrote you changed it but the commit has not been updated.)
I think it's fine and ready to merge as soon as you push the
rb'...'
change. (You wrote you changed it but the commit has not been updated.)
Yes, I'd like to merge this based on the discussion so far. For the one pending change above, please amend and force-push the one existing commit, not add an extra commit. Thanks!
I think it's fine and ready to merge as soon as you push the rb'...' change. (You wrote you changed it but the commit has not been updated.)
Ah indeed, I forgot to push, thanks for the nudge.
Now it's done and ready to merge I think.
Thanks! I wonder what these changes mean for Python 2? Most of these scripts use just
python
, which is our current convention for scripts written in a way that they're meant to work with both 2 and 3.@exploide Can you please take a look?