Closed noloader closed 7 years ago
We were not able to get XL C/C++ to generate good code for us at -O3
. We reverted Commit aa348abd1532 for the moment.
I would prefer to compile tea.cpp
at -O2
though use of a pragma
, but I cannot find the pragma
in the IBM compiler manual. Now open on Stack Overflow: IBM XL C/C++ equivalent to #pragma GCC optimize
.
The issue was cleared at Commit fc0867827e55 with the following change. The change was made in four places to TEA and XTEA encryption and decryption.
- word32 y, z;
+ word32 y, z, sum = 0;
Block::Get(inBlock)(y)(z);
- word32 sum = 0;
- while (sum != m_limit)
+ // http://github.com/weidai11/cryptopp/issues/503
+ while (*const_cast<volatile word32*>(&sum) != m_limit)
{
sum += DELTA;
y += ((z << 4) + m_k[0]) ^ (z + sum) ^ ((z >> 5) + m_k[1]);
Somewhat ironically, changing sum
to volatile
did not fix the issue. Because of it, we spent about 4 hours trying to rework the loop body when the problem was in loop control.
Changing the code to the following:
word32 sum = 0;
while (sum != m_limit)
{
sum += DELTA;
volatile word32 t1 = ((z << 4) + m_k[0]);
y += t1 ^ (z + sum) ^ ((z >> 5) + m_k[1]);
volatile word32 t2 = ((y << 4) + m_k[2]);
z += t2 ^ (y + sum) ^ ((y >> 5) + m_k[3]);
}
Results in a segmentation fault:
$ ./cryptest.exe tv all
Using seed: 1505560067
...
Testing SymmetricCipher algorithm TEA/ECB.
Segmentation fault
Benchmarks are in for GCC on a modern Skylake I use for testing. Don't ask me how or why, but TEA and XTEA run faster with the volatile accesses. Prior to the change TEA was pushing data at 48.6 cpb. After the change performance rose to 41.3 cpb.
BEFORE
<TR><TH>TEA/CTR (128-bit key)<TD>62<TD>48.6<TD>0.258<TD>811
<TR><TH>XTEA/CTR (128-bit key)<TD>56<TD>54.0<TD>0.260<TD>816
AFTER
<TR><TH>TEA/CTR (128-bit key)<TD>73<TD>41.3<TD>0.200<TD>630
<TR><TH>XTEA/CTR (128-bit key)<TD>63<TD>47.6<TD>0.201<TD>634
At
-O3
it looks like IBM's XL C/C++ is not generating the code we expect. We are hanging after RC6, which is the TEA benchmark.And:
Below,
Rijndael_Enc_AdvancedProcessBlocks_POWER8
is the AES/OFB random number generator. Don't get distracted by it. The issue lies inTEA::Enc
.while (sum != m_limit)
is the loop control forTEA::Enc::ProcessAndXorBlock
.