openwall / john

John the Ripper jumbo - advanced offline password cracker, which supports hundreds of hash and cipher types, and runs on many operating systems, CPUs, GPUs, and even some FPGAs
https://www.openwall.com/john/
Other
10.32k stars 2.1k forks source link

Handle StarOffice buggy SHA1 implementation #3089

Closed kholia closed 6 years ago

kholia commented 6 years ago

See https://github.com/magnumripper/JohnTheRipper/pull/3087 for details.

This could be related to https://github.com/kuschuermann/rltodfjlib/issues/1.

kholia commented 6 years ago

I ran out of time to debug this issue today. I am compiling LibreOffice but it is taking hours.

kholia commented 6 years ago

After modifying the file slightly, I saved the affected file in StarOffice again. The resultant file is crackable by JtR Jumbo just fine!

sample-5-openwall-two-versions.zip

LibreOffice somehow manages to open the original affected file with the correct password, and it also detects wrong passwords for the same file. Hmm, what are we missing?

I suspect that LibreOffice is ignoring the checksum failures, and relies of some further "data decryption" operations.

kholia commented 6 years ago

Debugging this further led me to to a weird conclusion; that the SHA1 implementation in LibreOffice is somewhat "special".

Hash the following data after hex-decoding it,

a5565d6fda30147ddfaff0fcb0b76018525518506da5952ad1b51254da1e8d63126b8e9dd90e1ffbf5f34712c24ac2245e12eebdc7c7c7c7d73193bb7dc6c1962acda498c241af0f011544c64c2453f8b67a8c6ee1ddecc3e4e3fce57ef5f3f501c8cd86113a8e2529322a4c44a430f60d5edfbe2d9eee018c107ac9a978f1b09e540942f3d51c84785e8e02761e841ebe4300035f2f36319c4ddac8ad46a1c7a13a85a931f9182169a791c7693ef7fb7d1462580ed0e6c0bbf11e51c10ddd9b4eb403d460bcbec0ed11153c5678d7897600eb7985dfc81abddbed7abba1470e46a311fab15ca047a9325c6bd973267eb5e27db5828a225b53d5ad041b7ce28bde26e7c88381db5a3249b1eaf6cf238e8e0ce30b8e0ce30a6c179bb62cf0163ddba27f3c2f8ef6a9ac93dc01eaf511c5f26ee50102abee271c6b3d851caf29af932787a8eee53012d5f1c6b6741453c2f56ce25d3e66408805ce6c63ad702a333c80602303628333c60f5501a2eee15f458c39054b2c34787b7acff209e7527f3945851c040dde9c19628ddf62c57c375f98b5d4d6a6f91ae6542a2ff60cf9b1d46008d528a1822a46a6505901e29202746693ca142e8c65308c449ea2de3dff3c91ba51837a9e5262a2709e32a2ab7c8e95fbbcf9200aa316ae9734ac687365db501946753964eb22827964f7680a8dcce1bf0545f914aeb1a6f6bc87d53414b6cb5d5262de09d636697bf98c9c2a43242f32db367637c24ffbb12e843d457dd8c82538b7192648da10d4243bd5885acd2e0b6b191f6613f71d1e6bfabbb03715adf6e97d12f854cc74cef121928571c6449c6e9d5156a62f07139e382fb451d82f1a5d49b6aadae93a1677d95c4b322f6f15677ebb6b7918d26cc7a5c1f64c297729bb9bc9df8dd15aee818f9a48dfee3e1be247655f835205b63b2f55640eb9ad601db91bc0f6da3619efabbef0d18ec5ee033fec8d863737c7744a59929ad3bc9fea4fc4444c1d07ac1df21d7b6625a1bf1ba63cdbeb8d2a7fe0e07f38d0f0ce4f61c3134baa727eecdfd0a6a8e5efccec2f

Our code and Python say e14947fff4823def2a51e206e1231235d969bd4d.

LibreOffice consistently says 00c20dee73d4990cfe1c8faf92ab5b0f517e0e19 and this is indeed the expected checksum.

I wonder what the trick is.

jfoug commented 6 years ago

Could this be a buffer clean bug, like we see in QNX's buggy implementation of SHA512?

Is libre office closed source?

I will play around with the above data in pass_gen.pl and see if I can figure this out. That was how I proved that QNX sha512 was crappy :)

kholia commented 6 years ago

LibreOffice is FOSS.

Here are the relevant files,

Look at SHA1DigestContext::updateDigest and ZipFile::StaticHasValidPassword methods.

jfoug commented 6 years ago

The worst problem will come when they 'fix' their code. Then they will have hashes that do not match. Now, they may not be able to do that, and will simply have to live with the buggy crap, or implement a different hash (along with signature), so that the new hash would not miss data encrypted with the buggy hash function. AFAIK, QNX has never fixed its sha512 stuff, as far as it goes for the data which john uses.

kholia commented 6 years ago

The weirdest thing is that our checksum results match those of LibreOffice (and StarOffice) for almost all files. Well, this is what enabled us to write a cracker in the first place.

For some reason, sample-5-openwall-two-versions.zip seems to be special.

kholia commented 6 years ago

The worst problem will come when they 'fix' their code.

I am hoping that I am running into some weird PEBKAC scenario, and not an actual crypto bug in LibreOffice. Let's see!

jfoug commented 6 years ago

If this is an overflow bug, it may only impact a few. That was what we saw in QNX. There were 4 bytes not cleaned up, on multi-buffer hashes, BUT they would be 0 most of the time. Only a couple bytes lengths out of 128 would cause the bug. So in that format to 'match', we would save the data from the prior limb, and if we detected we had this length bug, we would copy those bytes into our buffer during the sha512_final call.

jfoug commented 6 years ago

This really does not show the bug. I would need code for the rtl_digest_updateSHA1 set of functions (all the real sha1 functions).

kholia commented 6 years ago

Do the following links help?

jfoug commented 6 years ago

This is the real file (I think). I am looking into it now to see if there are bugs in the sha1 code.

https://github.com/LibreOffice/core/blob/master/sal/rtl/digest.cxx

kholia commented 6 years ago

Please note that this bug may have something to do with the length of the data being hashed. Lengths 860, 1024, 417, 866, 664, 536 are known to work but somehow length 756 is weird.

jfoug commented 6 years ago

It does, almost certainly. I am looking at this code right now,


    len = ctx->m_nL + (nDatLen << 3);
    if (len < ctx->m_nL)
        ctx->m_nH += 1;

    ctx->m_nH += (nDatLen >> 29);
ctx->m_nL = len;

this in particular bothers me: ctx->m_nH += 1; there is a shift left of 3, but +1 can only be for 1 bit

I guess that code is OK, as long as nDatLen < 2^29 (which is huge).

I will keep looking.

jfoug commented 6 years ago

756 leads me to think buffer not cleaned up type bug. 756 is 0x2F4 VERY close but still under a block size.

jfoug commented 6 years ago

I do not see the bug :(

kholia commented 6 years ago

I will try extracting this code from LibreOffice, and then reshaping it into a standalone easily debuggable program.

jfoug commented 6 years ago

There may be a bug if the actual length is 52 mod(64) (i.e. if the last block has 52 bytes in it). The code may errantly hash an extra limb. This may be >=52 and <=55, I do not know yet. The code is still a bit opaque.

jfoug commented 6 years ago

If this simply hashes another limb, it will be easy to add to john, IF we deem it important enough ;) We did this for QNX.

But if you can do a stand alone, then we can hash 1 byte larger strings, find out which match proper SHA1, which do not, and then make bug workarounds in our code to obtain the same results.

kholia commented 6 years ago

But if you can do a stand alone, then we can...

I gave it a quick try but failed. It will probably require a few hours of work. I will try to get it done on coming Thursday or Friday.

jfoug commented 6 years ago
./a -f libre-check.dat
52
00c20dee73d4990cfe1c8faf92ab5b0f517e0e19

:)

I made these changes to a normal sha1 final from here (https://github.com/clibs/sha1)

void SHA1Final(
    unsigned char digest[20],
    SHA1_CTX * context
)
{
    unsigned i;

    unsigned char finalcount[8];

    unsigned char c;
+   int bug = 0;
+   if ( ((context->count[0]>>3) & 63) >= 52 && ((context->count[0]>>3) & 63) <= 55)
+       bug = 1;

    for (i = 0; i < 8; i++)
    {
        finalcount[i] = (unsigned char) ((context->count[(i >= 4 ? 0 : 1)] >> ((3 - (i & 3)) * 8)) & 255);      /* Endian independent */
    }
    c = 0200;
    SHA1Update(context, &c, 1);
    while ((context->count[0] & 504) != 448)
    {
        c = 0000;
        SHA1Update(context, &c, 1);
    }
+   if (bug) {
+       c = 0000;
+       for (i = 0; i < 64; ++i)
+           SHA1Update(context, &c, 1);
+   }
    SHA1Update(context, finalcount, 8); /* Should cause a SHA1Transform() */
    for (i = 0; i < 20; i++)
    {
        digest[i] = (unsigned char)
            ((context->state[i >> 2] >> ((3 - (i & 3)) * 8)) & 255);
    }
}

NOTE, I do not know if >= 52 and <= 55 is correct. I am just guessing based upon outward looking at the code in https://github.com/LibreOffice/core/blob/master/sal/rtl/digest.cxx

jfoug commented 6 years ago

I changed the title, to implicate the guilty ;)

kholia commented 6 years ago

Neat hack @jfoug.

What would be required next to emulate this faulty implementation with 100% accuracy?

I think that it might be easier to just port the buggy function (including the length calculations) to a regular SHA1 implementation. Porting the whole LibreOffice SHA1 code is perhaps not a trivial task.

jfoug commented 6 years ago

To find the real problem range, we really do need to port the sal/rtl/digest.cxx to a simple stand alone, and push through strings of length 1 to 256 and make sure which modular length values ARE problems.

jfoug commented 6 years ago

Just the needed functions in digest.cxx (the sha1 stuff). Then build a little main which crypt argv[1], and then test it with a script, and compare to sha1sum values.

kholia commented 6 years ago

Got it, thanks!

It is time for bed here. I am gonna continue working on this tomorrow morning!

jfoug commented 6 years ago

once we know, we can add this buggy sha1 to the sha2.c code (or build it stand alone). We already have logic in the sha2.c to handle the buggy qnx sha512 stuff, that is why I was thinking there.

It looks like the only place needed would be in the checksum computation after the decrypt, and possibly initially in the conversion of the passwords. I think all the pbkdf2 code should be fine since the lengths likely are not in the buggy range (short salt).

jfoug commented 6 years ago

we might want to add this buggy sha1 to the staroffice_common code (does the openCL perform this logic in CPU level?) If the opencl does it in GPU code, then we will have to put the buggy sha1 there also.

The WORST thing, would be if they fix it, BUT give us no indication on whether the hash uses is correct or buggy :(

jfoug commented 6 years ago

grr, initial port of code gives me 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8 for password (correct), but gives 6c790ac52bc1a7236fabe26b486d31acece67742 for that known hash that should be 00c20dee73d4990cfe1c8faf92ab5b0f517e0e19

jfoug commented 6 years ago

It may be that I am not clearing out buffers exactly the same. but it works for most lengths, and fails where I expected it to. Even if I am not getting the exact same 'buggy' results, I AM getting the same buggy results at the same lengths. I should be able to wrap this up.

jfoug commented 6 years ago
$ perl -e '{use Digest::SHA1; $s = "1"; for ($i = 0; $i < 256; ++$i) { $a=Digest::SHA1::sha1_hex($s)."\n"; $b=`./a $s`; if ($a ne $b) { print $i+1 , "\n"; } $s .= "1"; } }'
52
53
54
55
116
117
118
119
180
181
182
183
244
245
246
247

exactly like I expected. mod(64) >=52 <= 55

Here is my port code. I would be happier if the bad data gave us the 00c20dee73d4990cfe1c8faf92ab5b0f517e0e19 expected results, but I still feel this port shows which lengths ARE buggy.

CODE FIXED

#include <stdio.h>
#include <string.h>
#include <stdint.h>

//#define WITHOUT_BUG

#define DIGEST_CBLOCK_SHA 64
#define DIGEST_LBLOCK_SHA 16

#define RTL_DIGEST_ROTL(a,n) (((a) << (n)) | ((a) >> (32 - (n))))

static uint32_t updateSHA_1(uint32_t x);

typedef struct DigestContextSHA
{
    uint32_t          m_nDatLen;
    uint32_t          m_pData[DIGEST_LBLOCK_SHA];
    uint32_t          m_nA, m_nB, m_nC, m_nD, m_nE;
    uint32_t          m_nL, m_nH;
}SHA_CTX;

static void initSHA(SHA_CTX *ctx);
static void updateSHA(SHA_CTX *ctx);
static void endSHA(SHA_CTX *ctx);

static void swapLong(uint32_t *pData, uint32_t nDatLen)
{
    uint32_t *X;
    int i, n;

    X = pData;
    n = nDatLen;

    for (i = 0; i < n; i++)
    {
        X[i] = __builtin_bswap32(X[i]);
    }
}

#define K_00_19 (uint32_t)0x5a827999L
#define K_20_39 (uint32_t)0x6ed9eba1L
#define K_40_59 (uint32_t)0x8f1bbcdcL
#define K_60_79 (uint32_t)0xca62c1d6L

#define F_00_19(b,c,d) ((((c) ^ (d)) & (b)) ^ (d))
#define F_20_39(b,c,d) ((b) ^ (c) ^ (d))
#define F_40_59(b,c,d) (((b) & (c)) | ((b) & (d)) | ((c) & (d)))
#define F_60_79(b,c,d) F_20_39(b,c,d)

#define BODY_X(i) \
    (X[(i)&0x0f] ^ X[((i)+2)&0x0f] ^ X[((i)+8)&0x0f] ^ X[((i)+13)&0x0f])

#define BODY_00_15(u,i,a,b,c,d,e,f) \
    (f)  = X[i]; \
    (f) += (e) + K_00_19 + RTL_DIGEST_ROTL((a), 5) + F_00_19((b), (c), (d)); \
    (b)  = RTL_DIGEST_ROTL((b), 30);

#define BODY_16_19(u,i,a,b,c,d,e,f) \
    (f)  = BODY_X((i)); \
    (f)  = X[(i)&0x0f] = RTL_DIGEST_ROTL((f),1); \
    (f) += (e) + K_00_19 + RTL_DIGEST_ROTL((a), 5) + F_00_19((b), (c), (d)); \
    (b)  = RTL_DIGEST_ROTL((b), 30);

#define BODY_20_39(u,i,a,b,c,d,e,f) \
    (f)  = BODY_X((i)); \
    (f)  = X[(i)&0x0f] = RTL_DIGEST_ROTL((f),1); \
    (f) += (e) + K_20_39 + RTL_DIGEST_ROTL((a), 5) + F_20_39((b), (c), (d)); \
    (b)  = RTL_DIGEST_ROTL((b), 30);

#define BODY_40_59(u,i,a,b,c,d,e,f) \
    (f)  = BODY_X((i)); \
    (f)  = X[(i)&0x0f] = RTL_DIGEST_ROTL((f),1); \
    (f) += (e) + K_40_59 + RTL_DIGEST_ROTL((a), 5) + F_40_59((b), (c), (d)); \
    (b)  = RTL_DIGEST_ROTL((b), 30);

#define BODY_60_79(u,i,a,b,c,d,e,f) \
    (f)  = BODY_X((i)); \
    (f)  = X[(i)&0x0f] = RTL_DIGEST_ROTL((f),1); \
    (f) += (e) + K_60_79 + RTL_DIGEST_ROTL((a), 5) + F_60_79((b), (c), (d)); \
    (b)  = RTL_DIGEST_ROTL((b), 30);

static void initSHA(SHA_CTX *ctx)
{
    memset(ctx, 0, sizeof(SHA_CTX));
    ctx->m_nA = (uint32_t)0x67452301L;
    ctx->m_nB = (uint32_t)0xefcdab89L;
    ctx->m_nC = (uint32_t)0x98badcfeL;
    ctx->m_nD = (uint32_t)0x10325476L;
    ctx->m_nE = (uint32_t)0xc3d2e1f0L;
}

static void updateSHA(SHA_CTX *ctx)
{
    uint32_t  A, B, C, D, E, T;
    uint32_t *X;

    A = ctx->m_nA;
    B = ctx->m_nB;
    C = ctx->m_nC;
    D = ctx->m_nD;
    E = ctx->m_nE;
    X = ctx->m_pData;

    BODY_00_15 (U,  0, A, B, C, D, E, T);
    BODY_00_15 (U,  1, T, A, B, C, D, E);
    BODY_00_15 (U,  2, E, T, A, B, C, D);
    BODY_00_15 (U,  3, D, E, T, A, B, C);
    BODY_00_15 (U,  4, C, D, E, T, A, B);
    BODY_00_15 (U,  5, B, C, D, E, T, A);
    BODY_00_15 (U,  6, A, B, C, D, E, T);
    BODY_00_15 (U,  7, T, A, B, C, D, E);
    BODY_00_15 (U,  8, E, T, A, B, C, D);
    BODY_00_15 (U,  9, D, E, T, A, B, C);
    BODY_00_15 (U, 10, C, D, E, T, A, B);
    BODY_00_15 (U, 11, B, C, D, E, T, A);
    BODY_00_15 (U, 12, A, B, C, D, E, T);
    BODY_00_15 (U, 13, T, A, B, C, D, E);
    BODY_00_15 (U, 14, E, T, A, B, C, D);
    BODY_00_15 (U, 15, D, E, T, A, B, C);
    BODY_16_19 (U, 16, C, D, E, T, A, B);
    BODY_16_19 (U, 17, B, C, D, E, T, A);
    BODY_16_19 (U, 18, A, B, C, D, E, T);
    BODY_16_19 (U, 19, T, A, B, C, D, E);

    BODY_20_39 (U, 20, E, T, A, B, C, D);
    BODY_20_39 (U, 21, D, E, T, A, B, C);
    BODY_20_39 (U, 22, C, D, E, T, A, B);
    BODY_20_39 (U, 23, B, C, D, E, T, A);
    BODY_20_39 (U, 24, A, B, C, D, E, T);
    BODY_20_39 (U, 25, T, A, B, C, D, E);
    BODY_20_39 (U, 26, E, T, A, B, C, D);
    BODY_20_39 (U, 27, D, E, T, A, B, C);
    BODY_20_39 (U, 28, C, D, E, T, A, B);
    BODY_20_39 (U, 29, B, C, D, E, T, A);
    BODY_20_39 (U, 30, A, B, C, D, E, T);
    BODY_20_39 (U, 31, T, A, B, C, D, E);
    BODY_20_39 (U, 32, E, T, A, B, C, D);
    BODY_20_39 (U, 33, D, E, T, A, B, C);
    BODY_20_39 (U, 34, C, D, E, T, A, B);
    BODY_20_39 (U, 35, B, C, D, E, T, A);
    BODY_20_39 (U, 36, A, B, C, D, E, T);
    BODY_20_39 (U, 37, T, A, B, C, D, E);
    BODY_20_39 (U, 38, E, T, A, B, C, D);
    BODY_20_39 (U, 39, D, E, T, A, B, C);

    BODY_40_59 (U, 40, C, D, E, T, A, B);
    BODY_40_59 (U, 41, B, C, D, E, T, A);
    BODY_40_59 (U, 42, A, B, C, D, E, T);
    BODY_40_59 (U, 43, T, A, B, C, D, E);
    BODY_40_59 (U, 44, E, T, A, B, C, D);
    BODY_40_59 (U, 45, D, E, T, A, B, C);
    BODY_40_59 (U, 46, C, D, E, T, A, B);
    BODY_40_59 (U, 47, B, C, D, E, T, A);
    BODY_40_59 (U, 48, A, B, C, D, E, T);
    BODY_40_59 (U, 49, T, A, B, C, D, E);
    BODY_40_59 (U, 50, E, T, A, B, C, D);
    BODY_40_59 (U, 51, D, E, T, A, B, C);
    BODY_40_59 (U, 52, C, D, E, T, A, B);
    BODY_40_59 (U, 53, B, C, D, E, T, A);
    BODY_40_59 (U, 54, A, B, C, D, E, T);
    BODY_40_59 (U, 55, T, A, B, C, D, E);
    BODY_40_59 (U, 56, E, T, A, B, C, D);
    BODY_40_59 (U, 57, D, E, T, A, B, C);
    BODY_40_59 (U, 58, C, D, E, T, A, B);
    BODY_40_59 (U, 59, B, C, D, E, T, A);

    BODY_60_79 (U, 60, A, B, C, D, E, T);
    BODY_60_79 (U, 61, T, A, B, C, D, E);
    BODY_60_79 (U, 62, E, T, A, B, C, D);
    BODY_60_79 (U, 63, D, E, T, A, B, C);
    BODY_60_79 (U, 64, C, D, E, T, A, B);
    BODY_60_79 (U, 65, B, C, D, E, T, A);
    BODY_60_79 (U, 66, A, B, C, D, E, T);
    BODY_60_79 (U, 67, T, A, B, C, D, E);
    BODY_60_79 (U, 68, E, T, A, B, C, D);
    BODY_60_79 (U, 69, D, E, T, A, B, C);
    BODY_60_79 (U, 70, C, D, E, T, A, B);
    BODY_60_79 (U, 71, B, C, D, E, T, A);
    BODY_60_79 (U, 72, A, B, C, D, E, T);
    BODY_60_79 (U, 73, T, A, B, C, D, E);
    BODY_60_79 (U, 74, E, T, A, B, C, D);
    BODY_60_79 (U, 75, D, E, T, A, B, C);
    BODY_60_79 (U, 76, C, D, E, T, A, B);
    BODY_60_79 (U, 77, B, C, D, E, T, A);
    BODY_60_79 (U, 78, A, B, C, D, E, T);
    BODY_60_79 (U, 79, T, A, B, C, D, E);

    ctx->m_nA += E;
    ctx->m_nB += T;
    ctx->m_nC += A;
    ctx->m_nD += B;
    ctx->m_nE += C;
}

static void endSHA(SHA_CTX *ctx)
{
    static const uint8_t end[4] =
    {
        0x80, 0x00, 0x00, 0x00
    };
    const uint8_t *p = end;

    uint32_t *X;
    int i;

    X = ctx->m_pData;
    i = (ctx->m_nDatLen >> 2);

    switch (ctx->m_nDatLen & 0x03)
    {
        case 1: X[i] &= 0x000000ff; break;
        case 2: X[i] &= 0x0000ffff; break;
        case 3: X[i] &= 0x00ffffff; break;
    }

    switch (ctx->m_nDatLen & 0x03)
    {
        case 0: X[i]  = ((uint32_t)(*(p++))) <<  0;
            //SAL_FALLTHROUGH;
        case 1: X[i] |= ((uint32_t)(*(p++))) <<  8;
            //SAL_FALLTHROUGH;
        case 2: X[i] |= ((uint32_t)(*(p++))) << 16;
            //SAL_FALLTHROUGH;
        case 3: X[i] |= ((uint32_t)(*(p++))) << 24;
    }

    swapLong(X, i + 1);

    i += 1;

#ifdef WITHOUT_BUG
    if (i > (DIGEST_LBLOCK_SHA - 2))
#else
    if (i >= (DIGEST_LBLOCK_SHA - 2))
#endif
    {
        for (; i < DIGEST_LBLOCK_SHA; i++)
        {
            X[i] = 0;
        }

        updateSHA(ctx);
        i = 0;
    }

    for (; i < (DIGEST_LBLOCK_SHA - 2); i++)
    {
        X[i] = 0;
    }

    X[DIGEST_LBLOCK_SHA - 2] = ctx->m_nH;
    X[DIGEST_LBLOCK_SHA - 1] = ctx->m_nL;

    updateSHA(ctx);
}

int rtl_digest_updateSHA1(SHA_CTX *ctx, const void *pData, uint32_t nDatLen)
{
    const uint8_t *d = (const uint8_t *)pData;

    uint32_t len;

    len = ctx->m_nL + (nDatLen << 3);
    if (len < ctx->m_nL)
        ctx->m_nH += 1;

    ctx->m_nH += (nDatLen >> 29);
    ctx->m_nL = len;

    if (ctx->m_nDatLen)
    {
        uint8_t *p = ((uint8_t *)ctx->m_pData) + ctx->m_nDatLen;
        uint32_t n = DIGEST_CBLOCK_SHA - ctx->m_nDatLen;

        if (nDatLen < n)
        {
            memcpy(p, d, nDatLen);
            ctx->m_nDatLen += nDatLen;

            return 0;
        }

        memcpy(p, d, n);
        d += n;
        nDatLen -= n;

        updateSHA(ctx);
        ctx->m_nDatLen = 0;
    }

    while (nDatLen >= DIGEST_CBLOCK_SHA)
    {
        memcpy(ctx->m_pData, d, DIGEST_CBLOCK_SHA);
        d += DIGEST_CBLOCK_SHA;
        nDatLen -= DIGEST_CBLOCK_SHA;

        swapLong(ctx->m_pData, 16);
        updateSHA(ctx);
    }

    memcpy(ctx->m_pData, d, nDatLen);
    ctx->m_nDatLen = nDatLen;

    return 0;
}

#define RTL_DIGEST_HTONL(l,c) \
    (*((c)++) = (uint8_t)(((l) >> 24) & 0xff), \
     *((c)++) = (uint8_t)(((l) >> 16) & 0xff), \
     *((c)++) = (uint8_t)(((l) >>  8) & 0xff), \
     *((c)++) = (uint8_t)(((l)       ) & 0xff))

int rtl_digest_getSHA1 (SHA_CTX *ctx, uint8_t *pBuffer)
{
    uint8_t *p = pBuffer;

    endSHA(ctx);
    RTL_DIGEST_HTONL(ctx->m_nA, p);
    RTL_DIGEST_HTONL(ctx->m_nB, p);
    RTL_DIGEST_HTONL(ctx->m_nC, p);
    RTL_DIGEST_HTONL(ctx->m_nD, p);
    RTL_DIGEST_HTONL(ctx->m_nE, p);

    return 0;
}
int main(int argc, char **argv) {
    SHA_CTX c;
    unsigned char hash[20];
    int i;

    initSHA(&c);
    if (!strcmp(argv[1], "-f")) {
        FILE *in = fopen(argv[2], "rb");
        char buf[512];
        size_t len = fread(buf, 1, 512, in);
        while (!feof(in)) {
            rtl_digest_updateSHA1(&c, buf, len);
            len = fread(buf, 1, 512, in);
        }
        if (len>0)
            rtl_digest_updateSHA1(&c, buf, len);
        fclose(in);
    }
    else {
        rtl_digest_updateSHA1(&c, argv[1], strlen(argv[1]));
    }
    rtl_digest_getSHA1(&c, hash);
    for (i = 0; i < 20; ++i)
        printf("%02x", hash[i]);
    printf ("\n");

}

// test script:
//  perl -e '{use Digest::SHA1; $s = "1"; for ($i = 0; $i < 256; ++$i) { $a=Digest::SHA1::sha1_hex($s)."\n"; $b=`./a $s`; if ($a ne $b) { print $i+1,"\n"; $s .= "1"; } } }'
jfoug commented 6 years ago

The 'fix' in their code is this 1 line in the endSHA function

     swapLong(X, i + 1);

     i += 1;

-    if (i >= (DIGEST_LBLOCK_SHA - 2))
+    if (i > (DIGEST_LBLOCK_SHA - 2))
     {
         for (; i < DIGEST_LBLOCK_SHA; i++)
         {
jfoug commented 6 years ago

Found my 'bug' in the port. I needed to byteswap array in this part of rtl_update_sha1

    while (nDatLen >= DIGEST_CBLOCK_SHA)
    {
+      uint32_t *X=ctx->m_pData;
        memcpy(ctx->m_pData, d, DIGEST_CBLOCK_SHA);
        d += DIGEST_CBLOCK_SHA;
        nDatLen -= DIGEST_CBLOCK_SHA;
+      swapLong(X, 16);
        updateSHA(ctx);
    }

After that change, I get proper hashes (bad and correct), depending upon leaving that >= bug I pointed out above. The reason it was passing before, was I was using data that did not need to be swapped ;) as soon as I changed my script to this, I was getting failures from 62 to 55 and from 64 and above.

Now, with the swapping added, I am seeing the same broken results, AND this is happening only from 52 to 55 byte count mod 64.

jfoug commented 6 years ago

Yup, my bad. I had CUT this code out (thinking it was ifdef, not ifndef)

#ifndef OSL_BIGENDIAN
        swapLong(ctx->m_pData, DIGEST_LBLOCK_SHA);
#endif /* OSL_BIGENDIAN */
jfoug commented 6 years ago

Here are 2 builds one with and one without the bug, along with sha1sum of that test data

$ sha1sum libre-check.dat
e14947fff4823def2a51e206e1231235d969bd4d *libre-check.dat

$ gcc sha1_star.c

$ ./a -f libre-check.dat
00c20dee73d4990cfe1c8faf92ab5b0f517e0e19

$ gcc sha1_star.c

$ ./a -f libre-check.dat
e14947fff4823def2a51e206e1231235d969bd4d
jfoug commented 6 years ago

NOTE, this https://github.com/magnumripper/JohnTheRipper/issues/3089#issuecomment-356324407 is the correct way to deal with these hashes. For lengths of the final SHA1 limb, if there are 52-55 bytes in that limb, then add the 0x80 byte null out the rest of that limb and hash, and do another limb of nulls, placing the proper bit count on that 'extra' limb.

magnumripper commented 6 years ago

Good stuff!

once we know, we can add this buggy sha1 to the sha2.c code (or build it stand alone). We already have logic in the sha2.c to handle the buggy qnx sha512 stuff, that is why I was thinking there.

Please don't add it to any shared file with proper SHA* code, and DEFINITELY don't add a a buggy SHA-1 to sha2.c (LOL)!

This is bug compatible code specific to Staroffice so should be in staroffice_common_plug.c or something like that. The QNX bug code should also not be in the shared file with proper stuff, it should be in some QNX-specific file IMHO.

jfoug commented 6 years ago

I think here, we should make a special sha1 function, but ONLY call it when needed, if the length is one of the buggy ones. that way we do not need to be 'as' concerned about overall performance of this sha1 code, and we can simply use a reference level type code.
The big bugaboo is how to handle this on GPU, if it is done on GPU.
An even bigger bugaboo is is they FIX the sha1 and then within libraoffice handle both the buggy and non-buggy ;) Since there is no less security (arguably there is more, due to obscurity), there is no real reason they would 'have' to fix it.

jfoug commented 6 years ago

The reason the QNX bug was there, is due to the sha2 code being 100% contained in john (the fall back code). It was simply an extra normally unused field in the ctx, and then a spot in the sha512final where it dirtied the buffer with the prior buffer data prior to calling the transform function (after setting the bit count)

jfoug commented 6 years ago

Most of the time, we use oSSL CTX version of SHA512. We only fall back to the generic when working on QNX hashes, OR when the oSSL build does not contain them.

Also, this qnx bug work around is in the simd logic for sha512. (I may be wrong on that one)

jfoug commented 6 years ago

It does not look like QNX does any SIMD logic. It looks like the code was 'started' to be added, but not completed.

So the generic sha2 code in sha2.c is the only place it lives at, I think.

jfoug commented 6 years ago

I did report this to the LibreOffice folks (https://bugs.documentfoundation.org/show_bug.cgi?id=114939)

We may have to make changes so that we use 'standard' sha1 for all hash lengths, but then also do a busted sha1 check if that last block length is of the known buggy range. Since this should not impact the speed of the pbkdf2, it should not slow down the speed of this format at all (or should not be noticeable), but if they fix this and upon file import, (use good code first, then if hashes fail, try the legacy buggy stuff) and then write files using only FIPS compliant hashing, then we will have to use 2 methods to check for hash data of one of these buggy lengths. Fortunately, only 4/64 lengths are buggy.

kholia commented 6 years ago

I believe that @kuschuermann encountered this bug way back in 2012.

jfoug commented 6 years ago

Probably so. Looks like the code was busted since beginning. they even had test cases which they simply commented out long ago. Their MD5 has the same bug, untested by me but the code is the same off by 1 error.

magnumripper commented 6 years ago

they even had test cases which they simply commented out long ago.

Wow, that's... not quite the best solution 🤣

jfoug commented 6 years ago

It is sort of interesting watching another coding group realize that a core bug like this is UGLY as hell. Since it impacts stored customer files, simply fixing the bug would likely invalidate 1 out of every 16 files using ODF. Why developers writing core library functions do not test their code, at even the simplest level, ugh.

Also a prime example of why it is often NOT a good idea to roll your own, when it comes to security/encryption! Just because the hash gives correct results for 'password', '123456' and 'abcdefghij..yz' does not mean it works ;)

frank-dittrich commented 6 years ago

s/interesting/scary/

jfoug commented 6 years ago

Well, I enjoy scary movies, so to me, they are interesting ;) But yes, this is not a good situation for them. There is no security issues (their buggy code is no less secure). BUT they are generating (and have forever) non compliant files. Oh well, they will have to muddle through the best way to recover from that sort of crap. It is likely they will have to keep the buggy sha1 code forever, use it when needed, and possibly generate a tool to convert 'broken' files. Who knows.

However, if the encrypted blob lengths are uniformly different, then at least there is a silver lining, in that 15/16 of the ODF files were written correctly. But there are likely millions of documents out ITRW that are of the busted format. AND it is very possible that ppl will be needing a tool like john to help 'open' these beasts!

frank-dittrich commented 6 years ago

Not sure how old the bug is. It could be inherited from forking OpenOffice (or even from StarOffice). So, you also have compatibility problems across different products.

I also enjoy watching scary movies, because I know it is just a movie and I am not affected.

kholia commented 6 years ago

Yes, this bug exists in StarOffice too. That is what I used to generate these files.