marktwtn commented 4 years ago

The acceleration is enabled when the compiler option -msse4.2 is used and the input size of trit/tryte for conversion is larger than 128-bit, which is the basic operation unit of the most SSE instructions.

The implementation is complex but it does acclerate the conversion speed as the following experiment result reveals:

trits_to_trytes() Input size Without SIMD SSE4.2(avg nsec) With SIMD SSE4.2(avg nsec) 81, 406.3, 261.5 243, 444.6, 162.7 6561, 53166.93, 27290.99
trytes_to_trits() Input size Without SIMD SSE4.2(avg nsec) With SIMD SSE4.2(avg nsec) 81, 355.6, 167.1 2592, 5752.3, 1751.8 2673, 6273.0, 2098.8

For more detailed experiment result, please reference: https://github.com/DLTcollab/dcurl/issues/92

Test Plan:

$ bazel test --test_output=all --copt=-msse4.2 //common/trinary/tests/... The command is for verifying the correctness.

jserv commented 4 years ago

@semenov-vladyslav, can you help review this pull request?

marktwtn commented 4 years ago

The CI tests failed. It seems like the failed tests is related to the cross compilation since it has the bazel option --crosstool_top and --host_crosstool_top?

oopsmonk commented 4 years ago

The CI tests failed. It seems like the failed tests is related to the cross compilation since it has the bazel option --crosstool_top and --host_crosstool_top?

the test failed on x86_64 toolchain, you can run it by bazel build --build_tests_only --config x86_64 //mam/troika/tests/...

marktwtn commented 4 years ago

The command $ bazel build --build_tests_only --config x86_64 //... ran well on my desktop. However, if I tried the command like the CI runs, it failed.

$ ./tools/ci_test_setup && bazel test --build_tests_only --remote_upload_local_results=false --crosstool_top=@iota_toolchains//tools/x86-64-core-i7--glibc--bleeding-edge-2018.07-1:toolchain --cpu=x86_64 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain -- //... -//mobile/ios/...

Can I ask for more information about this command? I am not really familiar with bazel.

oopsmonk commented 4 years ago

~these two commands should have the same output. This PR has no compilation errors in my environment. it's related to the container will update later.~ @marktwtn I was not fully wake up. The --config x86_64 uses the toolchain provided by bazel and another is the toolchain provide by bootlin for x86-64-core-i7.

marktwtn commented 4 years ago

@marktwtn I was not fully wake up. The --config x86_64 uses the toolchain provided by bazel and another is the toolchain provide by bootlin for x86-64-core-i7.

I wonder what is the main difference of these toolchains.

marktwtn commented 4 years ago

I fixed a bug and refactored some code in the updated branch.

The bug was not detected since the input data size of trits_to_trytes and trytes_to_trits testing are not large enough.

Here comes with 2 questions:

Should I add another test for larger input data size or just modify the input data size of the original test?

The test invoked by buildkite CI still fails on my desktop.

$ bazel test --build_tests_only --remote_upload_local_results=false
 --crosstool_top=@iota_toolchains//tools/x86-64-core-i7--glibc--bleeding-edge-2018.07-1:toolchain
 --cpu=x86_64 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain -- //... -//mobile/ios/...

What is the main difference between these and the other tests on buildkite CI? All the error logs show that

external/bazel_tools/tools/test/test-setup.sh: line 310:    
14 Segmentation fault      (core dumped) "${TEST_PATH}" "$@" 2>&1

and I don't know why.

oopsmonk commented 4 years ago

Here comes with 2 questions:

1. Should I add another test for larger input data size or just modify the input data size of the original test?

yes, please.

2. The test invoked by buildkite CI still fails on my desktop.
   ```
   $ bazel test --build_tests_only --remote_upload_local_results=false
    --crosstool_top=@iota_toolchains//tools/x86-64-core-i7--glibc--bleeding-edge-2018.07-1:toolchain
    --cpu=x86_64 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain -- //... -//mobile/ios/...
   ```

   What is the main difference between these and the other tests on buildkite CI?
   All the error logs show that
   ```
   external/bazel_tools/tools/test/test-setup.sh: line 310:    
   14 Segmentation fault      (core dumped) "${TEST_PATH}" "$@" 2>&1  
   ```

   and I don't know why.

It's caused by mysql/mariadb which may not installed on your system. since you are working on common part, using bazel test -- //common/... -//mobile/ios/... would work on your system.

You can run binaries in the bazel-bin folder, you can see memory mismatch in test cases if using the bootlin toolchain.

$ ./bazel-bin/common/crypto/iss/v1/tests/test_iss
common/crypto/iss/v1/tests/test_iss.c:97:signature_resolves_to_address_test:PASS
common/crypto/iss/v1/tests/test_iss.c:89:address_generation_test:FAIL: Memory Mismatch. Byte 0 Expected 0x44 Was 0x5A

-----------------------
2 Tests 1 Failures 0 Ignored
FAIL

$ ./bazel-bin/common/helpers/tests/test_pow
common/helpers/tests/test_pow.c:62:test_pow:FAIL: Memory Mismatch. Byte 0 Expected 0x57 Was 0x39
common/helpers/tests/test_pow.c:92:test_flex_pow:PASS

-----------------------
2 Tests 1 Failures 0 Ignored 
FAIL

marktwtn commented 4 years ago

I found another bug and fixed it in the updated branch. Sorry for not noticing that there are other error logs showing the memory mismatching problem.

I also increased the input data size of trit tryte conversion for testing SIMD SSE4.2 acceleration.

After the bug is fixed, most of the errors disappear. However, there are 2 tests that do not pass.

common/model/tests:test_transaction
common/model/tests:test_tryte_transaction

with the testing command

bazel test --build_tests_only --remote_upload_local_results=false
--crosstool_top=@iota_toolchains//tools/x86-64-core-i7--glibc--bleeding-edge-2018.07-1:toolchain
--cpu=x86_64 --host_crosstool_top=@bazel_tools//tools/cpp:toolchain -- //common/... -//mobile/ios/..

and error log

external/bazel_tools/tools/test/test-setup.sh: line 310:    
14 Segmentation fault      (core dumped) "${TEST_PATH}" "$@" 2>&1

oopsmonk commented 4 years ago

you can test the test_transaction directly and since it gets Segmentation fault you can enable address sanitizer like: bazel test -c dbg --config asan --config bootlin_x86_64_core_i7 //common/model/tests:test_transaction. You can just replace //common/model/tests:test_transaction for the other case.

here is output on my system:

Executing tests from //common/model/tests:test_transaction
-----------------------------------------------------------------------------
AddressSanitizer:DEADLYSIGNAL
=================================================================
==14==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55bda7e9241f bp 0x7ffd8a2eaa90 sp 0x7ffd8a2ea1c0 T0)
==14==The signal is caused by a READ memory access.
==14==Hint: address points to the zero page.
    #0 0x55bda7e9241e in _mm_store_si128 external/bootlin_x86_64_toolchain/lib/gcc/x86_64-buildroot-linux-gnu/8.1.0/include/emmintrin.h:714
    #1 0x55bda7e9241e in trits_to_trytes_sse42 common/trinary/trit_tryte_sse42.h:109
    #2 0x55bda7e9605e in trits_to_trytes common/trinary/trit_tryte.c:52
    #3 0x55bda7e8e922 in flex_trits_from_trits common/trinary/flex_trit.c:228
    #4 0x55bda7e8c331 in transaction_deserialize_trits common/model/transaction.c:112
    #5 0x55bda7e8d0b2 in transaction_deserialize_from_trits common/model/transaction.c:291
    #6 0x55bda7e8cfdf in transaction_deserialize common/model/transaction.c:251
    #7 0x55bda7e89c4b in test_deserialize_and_serialize common/model/tests/test_transaction.c:20
    #8 0x55bda7e999d8 in UnityDefaultTestRun external/unity/src/unity.c:1339
    #9 0x55bda7e8b296 in main common/model/tests/test_transaction.c:169
    #10 0x7fa548905b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #11 0x55bda7e882d9 in _start (/home/sam/.cache/bazel/_bazel_sam/c8834895b12547221dbe6a364e6a723e/execroot/org_iota_entangled/bazel-out/x86_64-dbg/bin/common/model/tests/test_transaction+0x62d9)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV external/bootlin_x86_64_toolchain/lib/gcc/x86_64-buildroot-linux-gnu/8.1.0/include/emmintrin.h:714 in _mm_store_si128
==14==ABORTING

oopsmonk commented 4 years ago

These two cases are failed on system toolchain as well when I run bazel test -c dbg --copt=-msse4.2 //common/...

//common/model/tests:test_transaction
//common/model/tests:test_tryte_transaction

marktwtn commented 4 years ago

If I change the intrinsic function from __mm_store_si128() to __mm_storeu_si128(), the errors will be fixed. Even with the bootlin toolchain, every test is passed.

The main difference of the intrinsic function is the requirement of address alignment to 16-byte boundary. However, I still need further investigation to give a better explanation of the importance of the address alignment. After the investigation is finished, I will update the conclusion here and update the branch, too.

@oopsmonk thank you for your advise and debugging help.

marktwtn commented 4 years ago

Conclusion:

The load and store intrinsic function are unified to the unalignment type since we can not control the input pointer value. The value is based on 8-byte alignment, but the alignment requirement of _mm_store_si128() and _mm_load_si128() is 16-byte alignment.

These two cases are failed on system toolchain as well when I run bazel test -c dbg --copt=-msse4.2 //common/...
* //common/model/tests:test_transaction

* //common/model/tests:test_tryte_transaction

The input pointer value of these two cases are 8-byte alignment but not 16-byte alignment, which cause the _mm_store_si128() to have general-protection exception.

I believe the unalignment intrinsic function is a better choice.

The pull request can be merged without any problem. Is there anything I need to revise? Such as change the variable naming convention to snake case?

oopsmonk commented 4 years ago

@marktwtn Could you write benchmarks for trits_to_trytes, trytes_to_trits and put it in //common/trinary/benchmark? thanks.

marktwtn commented 4 years ago

@marktwtn Could you write benchmarks for trits_to_trytes, trytes_to_trits and put it in //common/trinary/benchmark? thanks.

It seems that there are no other benchmark in the other directories for me to reference. I guess I will do it in my own way. Or is there any guidance or rule to follow?

oopsmonk commented 4 years ago

@marktwtn Could you write benchmarks for trits_to_trytes, trytes_to_trits and put it in //common/trinary/benchmark? thanks.

It seems that there are no other benchmark in the other directories for me to reference. I guess I will do it in my own way. Or is there any guidance or rule to follow?

yep, printing out the time consumed by functions looks good, like https://github.com/DLTcollab/dcurl/issues/92#issuecomment-492383987

oopsmonk commented 4 years ago

Hey, @marktwtn how are you? do you need any help on Bazel or else?

marktwtn commented 4 years ago

Hey, @marktwtn how are you? do you need any help on Bazel or else?

@oopsmonk I found out that performance might not behave as I expected.

The input size and testing scenario would effect the performance result, hence I am still working on it. If the size do effect the performance, I will modify the code to use the acceleration if the input is greater than a threshold size.

marktwtn commented 4 years ago

@oopsmonk I will forced push later.

The modification includes:

The benchmark.
The threshold for using SSE4.2 acceleration.
The commit message is changed. The experiment result from the https://github.com/DLTcollab/dcurl/issues/92#issuecomment-492383987 is removed since we have benchmark for different input size now.

iotaledger / entangled

common/trinary: Accelerate trit tryte conversion with SIMD SSE4.2 #1443

Test Plan: