intel / isa-l_crypto

Other
267 stars 80 forks source link

performance of md5_mb_over_4GB_test #67

Open KelvonLi opened 3 years ago

KelvonLi commented 3 years ago

Hi,

I'm trying md5_mb performance to figure out if it also perform much better than open ssl when running with many multiple buffers.

And I changed the test code as below and built it and had a test. It turned out that the performance was worse than open ssl, on both of test CPU platforms. Not sure if you had similar test, is it expected? And how should I improve its performance? Thanks a lot!

#Test result: /workspace/isa-l_crypto/tests/extended # ./md5_mb_over_4GB_test md5_large_test md5_openssl: runtime = 22236247 usecs, bandwidth 8 MB in 22.2362 sec = 0.38 MB/s Starting updates md5_ctx_mgr: runtime = 52901056 usecs, bandwidth 8 MB in 52.9011 sec = 0.16 MB/s

# Test code change /workspace/isa-l_crypto/tests/extended # git diff md5_mb_over_4GB_test.c

include "md5_mb.h"

include "endian_helper.h"

include <openssl/md5.h>

+#include "test.h" +

define TEST_LEN (1024*1024ull) //1M

define TEST_BUFS MD5_MIN_LANES

+//#define TEST_BUFS MD5_MAX_LANES

define ROTATION_TIMES 10000 //total length processing = TEST_LEN * ROTATION_TIMES

define UPDATE_SIZE (13*MD5_BLOCK_SIZE)

define LEN_TOTAL (TEST_LEN * ROTATION_TIMES)

@@ -54,6 +57,7 @@ int main(void) uint32_t i, j, k, fail = 0; unsigned char *bufs[TEST_BUFS]; struct user_data udata[TEST_BUFS];

gbtucker commented 3 years ago

Hi @KelvonLi,

This example md5_mb_over_4GB_test.c is not meant as a performance test and in fact the multi-buffer part does a lot more work then the single-buffer check. It is processing TEST_BUFFS x the data than the single buffer by doing multiple jobs. At the end you may notice that it checks the multiple final digests created in the multi-buffer part against the one single buffer result as a check. I suggest you start with one of the included performance tests instead.

KelvonLi commented 3 years ago

Hi @gbtucker ,

Thanks a lot for your reply. I'm testing with one single buffer and also multiple buffer. Here are some simple questions to ask:

  1. Are the md5_ctx_mgr_flush/md5_ctx_mgr_submit apis thread safe?

  2. To get the final md5 value of multiple buffers as one logic single buffer, does it have to use one single ctx and submit buffer one by one, right? I didn't find a way to leverage multiple ctxs(lanes) to calculate parallelly and generate one final md5 value.

My latest understanding is that, each ctx(lane) could only be used to calculate md5 at one moment and it should NOT be used until completed. Multiple ctxs(lanes) could run parallelly for each different md5 calculation. Please correct me if I'm wrong. Thanks a lot.

gbtucker commented 3 years ago

Hi @KelvonLi,

For 1. all the functions are thread safe and reentrant. I would suggest one ctx per thread and take a look at the examples in examples/saturation_test for how to do this.

For 2. the lanes must have independent hash jobs to run in parallel. Because these are cryptographic hashes, there is no way to break up one hash job and run pieces concurrently beyond the fundamental block size.

KelvonLi commented 3 years ago

Hi @gbtucker, Thanks a ton for your replies and sharing! I'll have some further study and test.