mmozeiko / pkgi

pkg download & installation directly on Vita
The Unlicense
248 stars 162 forks source link

aes128 and sha256 #13

Closed RandyGaul closed 7 years ago

RandyGaul commented 7 years ago

Just a quick question -- did you write these implementations yourself? They look really useful :) Were you planning on maybe using those implementations in future projects?

mmozeiko commented 7 years ago

Yes, I wrote most of them myself. sha256 is completely mine. aes128 is partially ported from that public domain implementation for Intel arch with qhasm (see links inside the file). It has a lot of modifications to make it work on ARM. Linked pdf paper describes exactly how it works.

At the end it seemed that these optimizations were not really needed. Vanilla C code is fast enough (~11MB/s for sha256 and ~12MB/s for aes128 if I remember correctly) and bottleneck is wifi. I expected crypto will be slow because of Vita hardware, that's why I tried to implement optimized implementations for crypto. But hey, programming is fun :) I enjoy tinkering with code for low-level optimizations.

What do you mean by future projects? I have released this code into public domain. So if anybody (including me) needs to use it for whatever purpose they want, its not an issue, just take it and use it. This code should be reasonably portable and does not have any extra dependencies (except utils.h).

If you want to discuss more or have questions about these implementations feel free to drop me e-mail or join discord server.

RandyGaul commented 7 years ago

Cool yeah I might hit you up on email sometime. I'm not familiar with encryption or cryptographic hashes, so it's great you implemented that stuff.

Reason I asked was maybe I can snatch your functions and implement SSE versions sometime in the distant-ish future for my own projects. But since you mentioned Vanilla C was already really fast, maybe it would be even easier than doing a SIMD port.

Where can I find your contact info?

mmozeiko commented 7 years ago

Well C version is fast enough only on Vita. On PC they are not so fast. You can get faster speed on PC with different implementations.

I could port them to SSE, that's not big deal. For sha256 it should give some speed boost. But not sure if it makes sense for AES. Because most CPU's from last 10 or so years have special AES native instructions. Only few CPUs models nowadays don't have AESNI. AESNI is so much much faster. For example, clean C implementation that uses 32-bit operations and lookups into 1K table will give you ~120MB/s (depends on CPU, of course). SSE version maybe will go 200? Not sure. And it will require SSSE3 instructions to be optimal. AESNI implementation will easily give you 1GB/s. This is reason why pkg2zip is so much more faster than pkg_dec. Latter decrypts using super slow byte oriented AES. That probably doesn't go faster than ~30MB/s.

You can probably get few MB/s faster if you write assembly directly, but I don't want to go that route. I expect these intrinsic implementations (SSE or NEON) are in 10-20% range of super-optimal asm implementations.

mmozeiko commented 7 years ago

Where can I find your contact info?

martins.mozeiko@gmail.com