Closed kholia closed 6 years ago
PR #3108 handles the busted hashes. It should also handle them once Libre (and Star?) office get fixed coding.
I believe that StarOffice is no longer developed. I am very interested in seeing how LibreOffice tackles this problem.
I believe they will fix it, but will still open busted files properly.
Thier initial patch simply fixed the bug (and uncommented a test case that was commented out, which was failing DUE to the bug). but a reviewer rejected it, due to not being able to load 1/16th of the legacy files saved (which would be a big boondoggle for them)
NOTE, once they fix this, if someone trades files with a star office user, then there is possibility that tool will not be able to handle the file. Yup, UGLY bug, when it impacts existing customer data like this one does!
This is why I wrote the PR like i did. It will handle all files in proper SHA1 hashes, OR files created earlier, with the buggy version.
@jfoug I believe that LibreOffice formats, libreoffice_fmt_plug.c
and the corresponding OpenCL format, will also need to be patched.
Update:
It should be possible to combine the StarOffice
and LibreOffice
formats but doing so may require some hacking around with the involved hash formats. The StarOffice
was written in a hastily fashion during a password cracking competition a long time ago.
I just verified that MS Office 2016 cannot create password protected OpenDocument format files. This is one less thing for the LibreOffice project to worry about.
It seems that MS Office 2016 also cannot open OpenDocument format files which have been password protected.
We'd need test vectors for LibreOffice (or rather ODF) before implementing the bug workaround.
LibreOffice uses the same encryption/decryption code for all StarOffice and native LibreOffice file formats (.odt is one such LibreOffice format).
Generating affected test vectors doesn't sound straightforward. A LibreOffice user doesn't have direct control over the META_INF/*
entries I think.
I just got "lucky" when generating some sample files using StarOffice earlier.
Also, modern LibreOffice versions are using AES
and unaffected SHA-256
primitives instead of Blowfish
and affected SHA-1
.
Update:
I had mixed up .odt
and ODF in this post earlier.
If someone wants to try generating such test vectors, use OpenOffice 3.4.1 from the following link,
http://archive.apache.org/dist/incubator/ooo/files/stable/3.4.1/
Reading over the patches going on within the Libre group, it does sound like they are calling this SHA1, 'Star SHA1'. It was probably inherited code, so the bug has been there forever. They do have another library with SHA1, which they have started to switch over to using. I believe that their plan is to replace all code with that other library, leave the 'special' StarSHA1 the way it is, and then use it as a fallback prior to rejecting opening a document. BUT only using that special StarSHA1 in that manner (fallback file import).
Note, I have not looked at the libre_fmt_plug.c code yet, but it is almost certain to need the starsha1. Now, starsha1 was placed into the staroffice_common.c code. we will probably want to rip it out of there, and make it into it's own .c file, or into a header that is included in the staroffice_common.c and libreoffice_common.c files.
But as for joining the 2 (and dropping star), shouldn't this be able to be done in the prepare format method?
But as for joining the 2 (and dropping star), shouldn't this be able to be done in the prepare format method?
Yes, I was thinking about the same approach.
Ok, in star office, there is length and original length. in libre there is no length, but length is computed from the length of hash (easy conversion). BUT that original length is not there.
The original length is used for the sha1 comparison hash. All of the values in $odf$ format use full 1024 byte buffers (I think)< but there are many in the $sxc$ format, which have shorter hashes, AND all use the same length for the SHA computation.
I can easily convert in prepare, but the odf format loses required data..... In the odf format, there is an 'unused' field. I could take over that field, and put the original length value in there (if it is different). But this is not as easy of a transition as I hoped.
Also, are we SURE that the ODF format is complete? Can there be hashed data shorted than 1024 bytes ??
ODF format (LibreOffice) should be reasonably complete. Here is a ODF hash (password is 1
) with data less than 1024 bytes,
$odf$*0*0*1024*16*bc3b602abf272baf0d8c93b062f8caba9df2b84c*8*c7bb802b6524545c*16*9b5693c1de3695544973cba5345f0ab0*0*368756e6b05e00a20b5a642269c4e3153126dcba855d0cb44d49c5a8dc1a45a89d84f43f2e74c7fb07f11c623d94a7bff2ff47ca996bc9b5f6bdd058f5ead12edce61c170b4b4d61ebb3abc7ecafec8cbc842b7247c5c468703a4644a68c0896e0bd593ae322c39a21e0f98f19468373dfe833722e057e7d0070af284f9f06f16888bd0a7e9e5273accb83b2d5d4ef104fb67f4f0a6fbf7619a5744e6ae4583920f35d94db888852eee37600011b9e1fbaba3b569d6dbce53e64e40a7efac2d6fa17e3eac4457ee3294195d93f82162930cbe20e8a2dbbf81eb5dbe2f378d43242271228c72fa0d7e83fc8266150c3faabfcb561fd59b60753882b9823e64c79b9c5dc90814928bdf2b93a35ca8e17c359bae3694d66357e692cb23d78252690e5bfadedd1e0036393de2334a9e81a9fcbb7819ae8fecc9bd3297f3f900251168cf31926f4e4731820a42ddb88d79446920289167928fccc30553e37066544c7d9bf8df7104ba0c1d26c5cfad89ca959157de0b9da667af24ae72c8c60aa25ac69c6155be975742509d519c45a9672c93c99f12f002d742e70f66b758a4bda0cc684bf7195790b987c661b13c076d3da2d7a31c6fb0a696dc0b3fafc1340733f2751200bb85edccc6775c5f0eb6a393a4b470ed1004c7f2de529e732c5dacf5ce83891ab4717544f81fc0614d841608b39831ba65be5e05275a1eee6810e541371b6527046da729aeff4665bc3701e8bac4cd5cd07d0255bdbfa72541c9af316772077d2f8ea4a576a5f37e00229bb56bb82cacc4a984d24db5dedea8b6b57e1e9252416edd3b75976cbc469cd4a1aa5b17c35f7e8a9b17cf6da57c6efc29ec1b4327401bdb1fa009e39d3badebaf9aaddef172172779790ef03c4fe5a114adcf643e4ef2bb36fde4281b253068d13ca747b44884ac9cdd30c7a4a749f35880457b773b863349b81839fabaaeba9ced64d6d7a35f3e25f374d5f9765504828a0f2a8ab946000e37cb1deba640208b072e29aa4a1e00c33d5c53268c52bf977f5045bf204f8789a359b1f7e1de258200dc6827fcb45cdf7231bbaf490f496ec99
Update: I didn't have much luck in generating LibreOffice files with the special "Star SHA1" touch in them ;(
But this is not as easy of a transition as I hoped.
Right. My bad for creating this messy situation.
Original length, and padded length is not needed for LibreOffice hashes. The original data length can always be found based on the length of the last field (encrypted data).
In the odf format, there is an 'unused' field. I could take over that field, and put the original length value in there (if it is different).
This sounds like a good hack.
So odf (libre) will always have proper content buffer length?? Why was star office optionally padded?? Can we simply remove the padding instead?
If the padding is something within the file, and that value has to be maintained, then we will certainly need a staroffice2john process kept, since it looks like the libre odf converter does not have to deal with that problem
So odf (libre) will always have proper content buffer length?
Yes.
Why was star office optionally padded?
I must have been smoking something weird when I added that crappy padding stuff in staroffice2john.py
. It is not required I think. I just can't remember why I added it. Does the OpenSSL's Blowfish API accept odd data lengths? I am not sure at the moment.
Can we simply remove the padding instead?
Yes, sure, but can we somehow make sure that the older padded hashes are also cracked?
If the padding is something within the file, and that value has to be maintained, then we will certainly need a staroffice2john process kept.
The padding is not something which is intrinsic to the file. It was artificially added by me for some (now) unknown reason.
I think I will be able to handle the merging of staroffice2john.py
and libreoffice2john.py
, once the formats are merged.
the merging of
staroffice2john.py
andlibreoffice2john.py
Could this be called odf2john or would that somehow not be correct? Same goes with the format BTW.
I am not sure. I think odf2john
would be technically more correct (with ODF standing for Open Document Format ) but staroffice2john.py
and libreoffice2john.py
are friendlier names. Naming is hard, right? :-)
I think we should call it odf2john.py but possibly have symlinks for staroffice2john.py and libreoffice2john.py pointing to it as well.
Hmm or we could have office2john handle these as well. But that may be logical to some and completely unguessable to others.
Either rename office2john to msoffice2john, or use office2john for all office formats and create additional symlinks msoffice2john etc.
I like these ideas. The only question I have is how will this symlink approach work for Windows users?
Symlinks do not work. You have to copy the files.
Why not simply add a -? command switch to all 2john processes, and within that dump out a tidbit of help, along with what data this *2john is created to extract?
The 'ideal' would be to create a data2john tool, that would look at the file in question, and call the appropriate 2john to do its dirty work ;) But with so many 2john programs, that may not be trivial, especially if some data blobs may be pure random looking binary, without known signatures.
The 'ideal' would be to create a data2john tool, that would look at the file in question, and call the appropriate *2john to do its dirty work ;)
I remember that such a thing was discussed on john-users (john-dev?) a while ago, but the idea did not pan out for some reason.
It might be possible to do a document2john.py and place all the appropriate *2john files into a subdir under run. Then the document2john simply looks at the data of the file, and calls the right 2john file in the ./run/document2john folder. I am pretty sure all known document files will contain some form of magic signature to them.
I also wonder, if we are simply overthinking a non-problem here?
I am pretty sure all known document files will contain some form of magic signature to them.
I believe that TrueCrypt containers are an exception. Perhaps, OpenSSL openssl enc
encrypted data is another exception.
I also wonder, if we are simply overthinking a non-problem here?
Yes, maybe. I would just solve the LibreOffice / StarOffice unification problem for now.
Looks like this will be in 6.1.0
https://cgit.freedesktop.org/libreoffice/core/commit/?id=9188ea83c346fdc2f668178ae7538665a1b09c02
Also they are removing configuration from the v1.2 ODF handing to use sha1/bf. v1.2 odf format will only save using sha256/aes It will still be there if someone saves in the v1.1 format, BUT I would hope it uses the correct sha1 (I have not seen the code changes yet).
Ok, it looks like the 1.1 will save using correct SHA1 values: https://cgit.freedesktop.org/libreoffice/core/commit/?id=50382b9e9256d7361e3770daa654fb8d09448635 So the changes I am making (binary is the first 4 bytes of the good sha1 and the first 4 bytes of the bad sha, then cmp_*() checks both of these), is the proper solution. It will handle both good and bad data.
hehe, they also chose to warn users for 52-55 byte length passwords ;)
https://cgit.freedesktop.org/libreoffice/core/commit/?id=9ef1734f03a008545a01fd394dd0e979bb230a0f
PR #3113 merges the formats. It does NOT do anything about the *2john scripts.
Thank you Jim for figuring this out and reporting it to LibreOffice. Should we also report it to OpenOffice.org?
That part I do not know. I assume you are referring to having OO add the shoddy SHA1 code INTO their product, to be able to open these files.
No, they already have the broken version, but they need to switch to the correct version (and fallback to broken) if they want to be able to open future LibreOffice files.
This has now been reported to AOO https://bz.apache.org/ooo/show_bug.cgi?id=127661
It is still in the UNCONFIRMED status, but they will soon find it is a real bug.
It looks like this is the original code base. Who ever wrote this is the one that Fukd things up in the code.
So what's remaining here is merging staroffice2john.py and libreoffice2john.py to odf2john.py if I get it right. But is that even necessary? Doesn't either of them work fine already, for any odf file? In that case just keep the "best" one and rename to odf2john.py and drop the other one.
@magnumripper It is not possible to simply drop one of them, and keep the other. I had some ideas on merging the code onto a single file but doing so may not be worth the effort.
I am OK with closing this ticket as done.
See https://github.com/magnumripper/JohnTheRipper/pull/3087 for details.
This could be related to https://github.com/kuschuermann/rltodfjlib/issues/1.