Handle StarOffice buggy SHA1 implementation

kholia commented 6 years ago

See https://github.com/magnumripper/JohnTheRipper/pull/3087 for details.

This could be related to https://github.com/kuschuermann/rltodfjlib/issues/1.

jfoug commented 6 years ago

PR #3108 handles the busted hashes. It should also handle them once Libre (and Star?) office get fixed coding.

kholia commented 6 years ago

I believe that StarOffice is no longer developed. I am very interested in seeing how LibreOffice tackles this problem.

jfoug commented 6 years ago

I believe they will fix it, but will still open busted files properly.

Thier initial patch simply fixed the bug (and uncommented a test case that was commented out, which was failing DUE to the bug). but a reviewer rejected it, due to not being able to load 1/16th of the legacy files saved (which would be a big boondoggle for them)

NOTE, once they fix this, if someone trades files with a star office user, then there is possibility that tool will not be able to handle the file. Yup, UGLY bug, when it impacts existing customer data like this one does!

jfoug commented 6 years ago

This is why I wrote the PR like i did. It will handle all files in proper SHA1 hashes, OR files created earlier, with the buggy version.

kholia commented 6 years ago

@jfoug I believe that LibreOffice formats, libreoffice_fmt_plug.c and the corresponding OpenCL format, will also need to be patched.

Update:

It should be possible to combine the StarOffice and LibreOffice formats but doing so may require some hacking around with the involved hash formats. The StarOffice was written in a hastily fashion during a password cracking competition a long time ago.

I just verified that MS Office 2016 cannot create password protected OpenDocument format files. This is one less thing for the LibreOffice project to worry about.

It seems that MS Office 2016 also cannot open OpenDocument format files which have been password protected.

magnumripper commented 6 years ago

We'd need test vectors for LibreOffice (or rather ODF) before implementing the bug workaround.

kholia commented 6 years ago

LibreOffice uses the same encryption/decryption code for all StarOffice and native LibreOffice file formats (.odt is one such LibreOffice format).

Generating affected test vectors doesn't sound straightforward. A LibreOffice user doesn't have direct control over the META_INF/* entries I think.

I just got "lucky" when generating some sample files using StarOffice earlier.

Also, modern LibreOffice versions are using AES and unaffected SHA-256 primitives instead of Blowfish and affected SHA-1.

Update:

I had mixed up .odt and ODF in this post earlier.

kholia commented 6 years ago

If someone wants to try generating such test vectors, use OpenOffice 3.4.1 from the following link,

http://archive.apache.org/dist/incubator/ooo/files/stable/3.4.1/

jfoug commented 6 years ago

Reading over the patches going on within the Libre group, it does sound like they are calling this SHA1, 'Star SHA1'. It was probably inherited code, so the bug has been there forever. They do have another library with SHA1, which they have started to switch over to using. I believe that their plan is to replace all code with that other library, leave the 'special' StarSHA1 the way it is, and then use it as a fallback prior to rejecting opening a document. BUT only using that special StarSHA1 in that manner (fallback file import).

Note, I have not looked at the libre_fmt_plug.c code yet, but it is almost certain to need the starsha1. Now, starsha1 was placed into the staroffice_common.c code. we will probably want to rip it out of there, and make it into it's own .c file, or into a header that is included in the staroffice_common.c and libreoffice_common.c files.

But as for joining the 2 (and dropping star), shouldn't this be able to be done in the prepare format method?

kholia commented 6 years ago

But as for joining the 2 (and dropping star), shouldn't this be able to be done in the prepare format method?

Yes, I was thinking about the same approach.

jfoug commented 6 years ago

Ok, in star office, there is length and original length. in libre there is no length, but length is computed from the length of hash (easy conversion). BUT that original length is not there.

The original length is used for the sha1 comparison hash. All of the values in $odf$ format use full 1024 byte buffers (I think)< but there are many in the $sxc$ format, which have shorter hashes, AND all use the same length for the SHA computation.

I can easily convert in prepare, but the odf format loses required data..... In the odf format, there is an 'unused' field. I could take over that field, and put the original length value in there (if it is different). But this is not as easy of a transition as I hoped.

Also, are we SURE that the ODF format is complete? Can there be hashed data shorted than 1024 bytes ??

kholia commented 6 years ago

ODF format (LibreOffice) should be reasonably complete. Here is a ODF hash (password is 1) with data less than 1024 bytes,

$odf$*0*0*1024*16*bc3b602abf272baf0d8c93b062f8caba9df2b84c*8*c7bb802b6524545c*16*9b5693c1de3695544973cba5345f0ab0*0*368756e6b05e00a20b5a642269c4e3153126dcba855d0cb44d49c5a8dc1a45a89d84f43f2e74c7fb07f11c623d94a7bff2ff47ca996bc9b5f6bdd058f5ead12edce61c170b4b4d61ebb3abc7ecafec8cbc842b7247c5c468703a4644a68c0896e0bd593ae322c39a21e0f98f19468373dfe833722e057e7d0070af284f9f06f16888bd0a7e9e5273accb83b2d5d4ef104fb67f4f0a6fbf7619a5744e6ae4583920f35d94db888852eee37600011b9e1fbaba3b569d6dbce53e64e40a7efac2d6fa17e3eac4457ee3294195d93f82162930cbe20e8a2dbbf81eb5dbe2f378d43242271228c72fa0d7e83fc8266150c3faabfcb561fd59b60753882b9823e64c79b9c5dc90814928bdf2b93a35ca8e17c359bae3694d66357e692cb23d78252690e5bfadedd1e0036393de2334a9e81a9fcbb7819ae8fecc9bd3297f3f900251168cf31926f4e4731820a42ddb88d79446920289167928fccc30553e37066544c7d9bf8df7104ba0c1d26c5cfad89ca959157de0b9da667af24ae72c8c60aa25ac69c6155be975742509d519c45a9672c93c99f12f002d742e70f66b758a4bda0cc684bf7195790b987c661b13c076d3da2d7a31c6fb0a696dc0b3fafc1340733f2751200bb85edccc6775c5f0eb6a393a4b470ed1004c7f2de529e732c5dacf5ce83891ab4717544f81fc0614d841608b39831ba65be5e05275a1eee6810e541371b6527046da729aeff4665bc3701e8bac4cd5cd07d0255bdbfa72541c9af316772077d2f8ea4a576a5f37e00229bb56bb82cacc4a984d24db5dedea8b6b57e1e9252416edd3b75976cbc469cd4a1aa5b17c35f7e8a9b17cf6da57c6efc29ec1b4327401bdb1fa009e39d3badebaf9aaddef172172779790ef03c4fe5a114adcf643e4ef2bb36fde4281b253068d13ca747b44884ac9cdd30c7a4a749f35880457b773b863349b81839fabaaeba9ced64d6d7a35f3e25f374d5f9765504828a0f2a8ab946000e37cb1deba640208b072e29aa4a1e00c33d5c53268c52bf977f5045bf204f8789a359b1f7e1de258200dc6827fcb45cdf7231bbaf490f496ec99

Update: I didn't have much luck in generating LibreOffice files with the special "Star SHA1" touch in them ;(

kholia commented 6 years ago

But this is not as easy of a transition as I hoped.

Right. My bad for creating this messy situation.

Original length, and padded length is not needed for LibreOffice hashes. The original data length can always be found based on the length of the last field (encrypted data).

kholia commented 6 years ago

In the odf format, there is an 'unused' field. I could take over that field, and put the original length value in there (if it is different).

This sounds like a good hack.

jfoug commented 6 years ago

So odf (libre) will always have proper content buffer length?? Why was star office optionally padded?? Can we simply remove the padding instead?

If the padding is something within the file, and that value has to be maintained, then we will certainly need a staroffice2john process kept, since it looks like the libre odf converter does not have to deal with that problem

kholia commented 6 years ago

So odf (libre) will always have proper content buffer length?

Yes.

Why was star office optionally padded?

I must have been smoking something weird when I added that crappy padding stuff in staroffice2john.py. It is not required I think. I just can't remember why I added it. Does the OpenSSL's Blowfish API accept odd data lengths? I am not sure at the moment.

Can we simply remove the padding instead?

Yes, sure, but can we somehow make sure that the older padded hashes are also cracked?

kholia commented 6 years ago

If the padding is something within the file, and that value has to be maintained, then we will certainly need a staroffice2john process kept.

The padding is not something which is intrinsic to the file. It was artificially added by me for some (now) unknown reason.

I think I will be able to handle the merging of staroffice2john.py and libreoffice2john.py, once the formats are merged.

magnumripper commented 6 years ago

the merging of staroffice2john.py and libreoffice2john.py

Could this be called odf2john or would that somehow not be correct? Same goes with the format BTW.

kholia commented 6 years ago

I am not sure. I think odf2john would be technically more correct (with ODF standing for Open Document Format ) but staroffice2john.py and libreoffice2john.py are friendlier names. Naming is hard, right? :-)

magnumripper commented 6 years ago

I think we should call it odf2john.py but possibly have symlinks for staroffice2john.py and libreoffice2john.py pointing to it as well.

magnumripper commented 6 years ago

Hmm or we could have office2john handle these as well. But that may be logical to some and completely unguessable to others.

frank-dittrich commented 6 years ago

Either rename office2john to msoffice2john, or use office2john for all office formats and create additional symlinks msoffice2john etc.

kholia commented 6 years ago

I like these ideas. The only question I have is how will this symlink approach work for Windows users?

jfoug commented 6 years ago

Symlinks do not work. You have to copy the files.

Why not simply add a -? command switch to all 2john processes, and within that dump out a tidbit of help, along with what data this *2john is created to extract?

The 'ideal' would be to create a data2john tool, that would look at the file in question, and call the appropriate 2john to do its dirty work ;) But with so many 2john programs, that may not be trivial, especially if some data blobs may be pure random looking binary, without known signatures.

kholia commented 6 years ago

The 'ideal' would be to create a data2john tool, that would look at the file in question, and call the appropriate *2john to do its dirty work ;)

I remember that such a thing was discussed on john-users (john-dev?) a while ago, but the idea did not pan out for some reason.

jfoug commented 6 years ago

It might be possible to do a document2john.py and place all the appropriate *2john files into a subdir under run. Then the document2john simply looks at the data of the file, and calls the right 2john file in the ./run/document2john folder. I am pretty sure all known document files will contain some form of magic signature to them.

jfoug commented 6 years ago

I also wonder, if we are simply overthinking a non-problem here?

kholia commented 6 years ago

I am pretty sure all known document files will contain some form of magic signature to them.

I believe that TrueCrypt containers are an exception. Perhaps, OpenSSL openssl enc encrypted data is another exception.

I also wonder, if we are simply overthinking a non-problem here?

Yes, maybe. I would just solve the LibreOffice / StarOffice unification problem for now.

jfoug commented 6 years ago

Looks like this will be in 6.1.0

https://cgit.freedesktop.org/libreoffice/core/commit/?id=9188ea83c346fdc2f668178ae7538665a1b09c02

Also they are removing configuration from the v1.2 ODF handing to use sha1/bf. v1.2 odf format will only save using sha256/aes It will still be there if someone saves in the v1.1 format, BUT I would hope it uses the correct sha1 (I have not seen the code changes yet).

Ok, it looks like the 1.1 will save using correct SHA1 values: https://cgit.freedesktop.org/libreoffice/core/commit/?id=50382b9e9256d7361e3770daa654fb8d09448635 So the changes I am making (binary is the first 4 bytes of the good sha1 and the first 4 bytes of the bad sha, then cmp_*() checks both of these), is the proper solution. It will handle both good and bad data.

hehe, they also chose to warn users for 52-55 byte length passwords ;)

https://cgit.freedesktop.org/libreoffice/core/commit/?id=9ef1734f03a008545a01fd394dd0e979bb230a0f

jfoug commented 6 years ago

PR #3113 merges the formats. It does NOT do anything about the *2john scripts.

solardiz commented 6 years ago

Thank you Jim for figuring this out and reporting it to LibreOffice. Should we also report it to OpenOffice.org?

jfoug commented 6 years ago

That part I do not know. I assume you are referring to having OO add the shoddy SHA1 code INTO their product, to be able to open these files.

frank-dittrich commented 6 years ago

No, they already have the broken version, but they need to switch to the correct version (and fallback to broken) if they want to be able to open future LibreOffice files.

jfoug commented 6 years ago

This has now been reported to AOO https://bz.apache.org/ooo/show_bug.cgi?id=127661

It is still in the UNCONFIRMED status, but they will soon find it is a real bug.

It looks like this is the original code base. Who ever wrote this is the one that Fukd things up in the code.

magnumripper commented 6 years ago

So what's remaining here is merging staroffice2john.py and libreoffice2john.py to odf2john.py if I get it right. But is that even necessary? Doesn't either of them work fine already, for any odf file? In that case just keep the "best" one and rename to odf2john.py and drop the other one.

kholia commented 6 years ago

@magnumripper It is not possible to simply drop one of them, and keep the other. I had some ideas on merging the code onto a single file but doing so may not be worth the effort.

kholia commented 6 years ago

I am OK with closing this ticket as done.

openwall / john

Handle StarOffice buggy SHA1 implementation #3089