kokke / tiny-bignum-c

Small portable multiple-precision unsigned integer arithmetic in C
The Unlicense
419 stars 86 forks source link

Fast multiplication #11

Open mangofromage opened 5 years ago

mangofromage commented 5 years ago

This is alternative multiplication method based on column operations. https://www.youtube.com/watch?v=zm3tQ_BPgm8 - idea is from this video. I also added test that compares methods. The current multiplication method should be replaced with new multiplication method.

mangofromage commented 5 years ago

For me it is ok to merge it and edit in way you like ;)

kokke commented 4 years ago

I got away from this PR @kolkil - sorry about that - I took some time off and forgot everything about software development :)

I will try and get it merged and edit whatever small details I want to change.

If I keep forgetting, please remind me again that I want to merge this PR.

mhaeuser commented 4 years ago

Good day,

First off, I currently use a heavily modified version of this library and hence would ask for someone to try to reproduce this with a "clean" version of the library. However, as the alternative multiplication function does not use any of the other functions, I am fairly certain this is a rare bug in the logic.

I'm not yet sure what pattern triggers it, but I do know this works very well in most cases. However, when trying to integrate this with a custom crypto stack, I started to get weird results and decided to compare this mul algo against the original. The first result is the incorrect result of this faster multiplication algorithm (locally modded, but unmodded seems to cause the same issue), the second result is the correct (verified against OpenSSL BN) solution of the original.

If I find out what causes this, I'll report back, but I'll need to familiarise myself with the algorithmic idea for a bit.

14e8839b4900fcfb5fc4d6dc64753740f59ae0afbd0fe9379bdbee18bd606c57d014e2d44e079dc3ffaabd4226c5d64792c1df2c24d3260567156b6caf278c7355fd299ae6f993952011488cd39c1129862edead758d1f31e6a11221babffab183f0056aaf735d8616972a07a128e79bdd64ef96e8de71bab4a968e40148f017a52350cd1296347c93d5c82f2b36bb72bf1f185622bdf1763a230c064d8a878c4d0058102ff68676d0a5a4f553de75230d4e727d51dd8c1c35d65f9b6feba90dc87a5d8f1f6ec49e156137430647679897c779601b00f97b6075029a32f81c4478f365a7e22482d3db69d6d5d3ede9ef2f648b3f56eb8d9ba57e838ee8f2e457f * 14e8839b4900fcfb5fc4d6dc64753740f59ae0afbd0fe9379bdbee18bd606c57d014e2d44e079dc3ffaabd4226c5d64792c1df2c24d3260567156b6caf278c7355fd299ae6f993952011488cd39c1129862edead758d1f31e6a11221babffab183f0056aaf735d8616972a07a128e79bdd64ef96e8de71bab4a968e40148f017a52350cd1296347c93d5c82f2b36bb72bf1f185622bdf1763a230c064d8a878c4d0058102ff68676d0a5a4f553de75230d4e727d51dd8c1c35d65f9b6feba90dc87a5d8f1f6ec49e156137430647679897c779601b00f97b6075029a32f81c4478f365a7e22482d3db69d6d5d3ede9ef2f648b3f56eb8d9ba57e838ee8f2e457f = fffffffefffffffefffffffcfffffffcfffffffafffffffafffffffafffffff9fffffff8fffffff5fffffff6fffffff3fffffff3fffffff1fffffff2fffffff2fffffff3fffffff3fffffff1fffffff0ffffffedffffffedffffffebffffffeeffffffecffffffeaffffffe9ffffffe7ffffffe5ffffffe6ffffffe6ffffffe3ffffffe5ffffffe1ffffffe6ffffffe2ffffffdfffffffdfffffffddffffffe0ffffffe1ffffffddffffffddffffffdeffffffdaffffffdbffffffdbffffffd5ffffffd6ffffffd5ffffffd9ffffffd4ffffffd2ffffffd4ffffffd2ffffffd2ffffffd5ffffffcfffffffd3ffffffcfffffffd1ffffffceffffffcdffffffd4fb843a8a639af780f2991b71933c8cee17bef9b931f4040732f606756c9525158cd2e892094dde117a6630df12114bec3e082b91f6f7990c4dce50536572b8f5f273e6dd1b1ec6f5daebc86ef8f88ae4b6ad81659254148cc69818bbe89af2942d2acd48da8ae6599a0821c5c7b3eaacc35b1ab4b39e923045527dfe810265d01ac1471bd344b7736d5b65dcb9e6e5c0ecb0613f269bd8906c2e49d59303adce635ad27e3010f4a335c9da561ad0154445275896affc3f882cb2a0f87f4e83c6e9b79a055389d8bf7e1ef1a38e5ca794e568d1a47cb057af41c1a139e83ed83f06cdbdbc7d92a57dc6526bd9d30157bc5449833123f0b0cbb0565ea96adab4b1 vs fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffb843ab9639af7b1f2991ba0933c8d1e17bef9e431f4043432f6069f6c9525418cd2e8ba094dde397a66310a12114c113e082bb1f6f799304dce50766572b918f273e7001b1ec716daebc88df8f88b06b6ad8185925414aac69818d6e89af2b22d2acd67da8ae6749a0821dfc7b3eac9c35b1acab39e924945527e15810265e91ac14732d344b78a6d5b65f4b9e6e5d2ecb06155269bd8a36c2e49e89303ade1635ad2903010f4b135c9da631ad01552452758a2affc3f942cb2a1037f4e83cfe9b79a0d5389d8c77e1ef1ac8e5ca79be568d1a97cb057b641c1a13de83ed84406cdbdbf7d92a580c6526bdbd30157be5449833223f0b0ccb0565ea96adab4b1

mangofromage commented 4 years ago

Could you show the code that generated this? Algorithm is explaned in this video https://www.youtube.com/watch?v=zm3tQ_BPgm8

mhaeuser commented 4 years ago

@kolkil Thanks, I will check it out asap. I'm a bit low on time right now and cannot provide you a sample caller, but you should be able to reproduce it by just parsing above's operands and calling the alternative mul.

Frankly, I still do not understand the algo, but I found it odd there was no overflow check around here: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L309 I added code to increment tmp_to_add on overflow, and I confirmed the new result is closer to the correct solution than before, but it's not quite there yet. If you find another issue (assuming this is correct), please let me know.

EDIT: Yes, what I found was correct, and this can overflow too: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L308 With both fixed, I get the correct result, so the algo seems fine.

I cannot give you a diff as, as I said, my local copy is modded, but for the first overflow it is sufficient to store the additive result separately and compare for being smaller than one of the previous operands (i.e. tmp2 = c->array[i] + (DTYPE)tmp, tmp2 < (DTYPE)tmp). This works because, when you think of the negative two's complement, -1 is MAX_UINTx (x being the width), so a wrapped around result must be smaller than both original operands.

By the way, with Clang and an X64 target, this uses CF, so it should be pretty good performing. This can be applied to the add and sub operations as well to get rid of the 64-bit int dependency. If someone manages that for mul too, the array int size could be raised to 64-bit to increase performance (given the mul solution is not slow).

For the second mentioned overflow, I just raised tmp_to_add to DTYPE_TMP and instead of initialising back to zero, I left-shifted by 32 bits to not discard the high bits. I am not convinced it cannot overflow again given unlucky input, but I will think of something later, or hope you will come up with a new solution (or an argument for why it cannot overflow again).

EDIT2: Looking at the amount of iterations where tmp_to_add can be increased and the high results multiplications can yield, I don't feel like tmp_to_add is a feasible option. It might need a function that propagates additions upwards on the BIGNUM itself

Thanks for your work

mangofromage commented 4 years ago

I understand the problem, gonna fix it soon, thank you ;)

kokke commented 3 years ago

@kolkil I don't mean to be rude at all; have you had a chance to look at this?