weidai11 / cryptopp

free C++ class library of cryptographic schemes
https://cryptopp.com
Other
4.85k stars 1.5k forks source link

SIGILL on older OS X with new Clang due to global ctor instructions #751

Closed noloader closed 5 years ago

noloader commented 5 years ago

I'm testing on an old MacBook Pro from about 2011. It has OS X 10.8 and SSE4 but nothing higher. Using Macports Clang 6.0 for a compile:

$ CXX=/opt/local/bin/clang++-mp-6.0 make -j 5
/opt/local/bin/clang++-mp-6.0 ... -c cryptlib.cpp
/opt/local/bin/clang++-mp-6.0 ... -c cpu.cpp
...

/opt/local/bin/clang++-mp-6.0 ... -c chacha.cpp
/opt/local/bin/clang++-mp-6.0 ... -mavx2 -c chacha_avx.cpp
/opt/local/bin/clang++-mp-6.0 ... -msse2 -c chacha_simd.cpp
...

Notice chacha_avx.cpp is compiled with -mavx2. At runtime we catch a SIGILL on a down level machine:

$ lldb cryptest.exe
(lldb) target create "cryptest.exe"
Current executable set to 'cryptest.exe' (x86_64).

(lldb) r v
...
* thread #1: tid = 0x19fcc6f, 0x0000000100194b29 cryptest.exe`_GLOBAL__sub_I_chacha_avx.cpp at string:1356, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x0000000100194b29 cryptest.exe`_GLOBAL__sub_I_chacha_avx.cpp at string:1356
   1353         {
   1354             size_type (&__a)[__n_words] = __r_.first().__r.__words;
   1355             for (unsigned __i = 0; __i < __n_words; ++__i)
-> 1356                 __a[__i] = 0;
   1357         }
   1358
   1359     template <size_type __a> static

(lldb) disass
cryptest.exe`_GLOBAL__sub_I_chacha_avx.cpp at chacha_avx.cpp:
   0x100194b10:  pushq  %rbp
   0x100194b11:  movq   %rsp, %rbp
   0x100194b14:  pushq  %r14
   0x100194b16:  pushq  %rbx
   0x100194b17:  movq   0x24a062(%rip), %rax      ; (void *)0x0000000100409ad0: vtable for CryptoPP::NullNameValuePairs
   0x100194b1e:  addq   $0x10, %rax
   0x100194b22:  movq   %rax, 0x2ddc97(%rip)      ; CryptoPP::s_nullNameValuePairs
-> 0x100194b29:  vxorps %xmm0, %xmm0, %xmm0
   0x100194b2d:  vmovups %xmm0, 0x2ddc93(%rip)     ; CryptoPP::DEFAULT_CHANNEL
   0x100194b35:  movq   $0x0, 0x2ddc98(%rip)      ; CryptoPP::DEFAULT_CHANNEL + 12
   0x100194b40:  leaq   0x2ddc81(%rip), %rsi      ; CryptoPP::DEFAULT_CHANNEL
   0x100194b47:  movq   0x248a0a(%rip), %rbx      ; (void *)0x00007fff8f09321e: std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::~basic_string()
   ...

vxorps is an AVX instruction. It seems Clang is using the enhanced ISA for global constructors instead of the base ISA. And it is worth mentioning we don't provide any global data.

GCC 6 does not suffer the problem. Or is it not surfacing on either the old MacBook and an old Core2 laptop running Debian. Clang is also OK on the old Debian laptop, so it looks like an issue local to OS X and Clang.

Now open on the LLVM-dev mailing list: Restrict global constructors to base ISA.

noloader commented 5 years ago

Ping @denisbider, @mouse07410, @MarcelRaad,

Hey guys. I investigated this more and I see a bad interaction I don't know how to work around. The SIGILL or EXC_BAD_INSTRUCTION is coming from the vxorps, which is zero'ing out the storage for the string DEFAULT_CHANNEL. vxorps is an AVX instruction.

Current code:

// cryptlib.h
const std::string DEFAULT_CHANNEL;

// chacha_avx.cpp, compiled with -mavx2
#include "cryptlib.h"

We changed to the pattern where the header provided the definition to avoid Static Initialization Order Fiasco problems in Crypto++ 6.0 and earlier:

Previous code:

// cryptlib.h
extern const std::string DEFAULT_CHANNEL;

// cryptlib.cpp
const std::string DEFAULT_CHANNEL;

// chacha_avx.cpp, compiled with -mavx2
#include "cryptlib.h"

In the examples above, chacha_avx.cpp actually includes misc.h, which includes cryptlib.h. cryptlib.h is the root of the problem so I cut out the middle header.

I'm not sure how to proceed. We can't take option (1) because it results in a SIGILL or EXC_BAD_INSTRUCTION in certain configurations. We can't take option (2) because it is Static Initialization Order Fiasco. If I try to avoid the header then we land in (3) compile errors as shown below.

./misc.h:451:9: error: use of undeclared identifier 'InvalidArgument'
                throw InvalidArgument("memcpy_s: buffer overflow");
                      ^
./misc.h:494:9: error: use of undeclared identifier 'InvalidArgument'
                throw InvalidArgument("memmove_s: buffer overflow");
                      ^
./misc.h:1039:9: error: use of undeclared identifier 'InvalidArgument'
                throw InvalidArgument("RoundUpToMultipleOf: integer overflow");
                      ^
./misc.h:1100:10: error: unknown type name 'LittleEndian'
        typedef LittleEndian NativeByteOrder;
                ^
./misc.h:1116:8: error: unknown type name 'ByteOrder'
inline ByteOrder GetNativeByteOrder()
       ^

We may be able to change DEFAULT_CHANNEL and AAD_CHANNEL to an enum or struct for trivial construction, but then we still need something for const NullNameValuePairs g_nullNameValuePairs;.

Are we allowed to drop const from std::string DEFAULT_CHANNEL? If we can, doesn't that mean we violate ODR? And what happens when someone changes the vale of DEFAULT_CHANNEL??? It results in duplicate symbols.

Do you guys have any thoughts on how to proceed?

denisbider commented 5 years ago

I don't really consider the static initialization order issue to be a "fiasco".

If you want to completely avoid it, then the library can't have non-trivial global objects (NTGOs); period.

If that's considered unreasonable, then the next sensible policy is as follows:

As far as I can tell, the present solution - where the NTGO is defined, rather than declared, in a header file; causing separate copies in every translation unit - is against the C++ One-Definition Rule and is an invalid solution.

noloader commented 5 years ago

Thanks @denisbider.

If Crypto++ has any NTGOs that depend on initialization of other NTGOs, then Crypto++ must organize definitions of such NTGOs in a single source file to control initialization order.

Yeah, I tried to minimize them when possible.


It's likely that some users are going to define their own NTGOs which depend on Crypto++ NTGOs to already be initialized... There will be users who define their own NTGOs which depend on Crypto++ NTGOs, but which use a compiler which does not provide a way for Crypto++ to force earlier initialization of its NTGOs.

Yeah, this is the case that seems to be causing the most problems nowadays. We catch the bug reports because they are our objects. It is why I try to minimize the objects.


where the NTGO is defined, rather than declared, in a header file; causing separate copies in every translation unit - is against the C++ One-Definition Rule and is an invalid solution.

I believe the const std::string ... in cryptlib.h is ODR-free. It uses internal linkage and provides the same definition in all translation units. But it is heavy weight. Also see Which is the best practice? Defining strings in C++ files or header files?.

( const std::string ... was heavier then I thought and I am surprised the optimizer did not remove it since it was not used).


Do you envision problems with a change like this? It turns the channel into an enum, similar to CipherDir.

enum ChannelId {
    /// \brief Default channel for BufferedTransformation
    DEFAULT_CHANNEL = 1,
    /// \brief Channel for additional authenticated data
    AAD_CHANNEL
};

It should provide the extensibility provided by std::string. We just need to tell users to define their channels with values > 65535. (We keep 16-bits, and we give the other 16-bits away).

What I am less clear on is usability. Will an enum be as easy to use as a std::string?

Related, do you know why Wei used a string instead of an enum? (I've found Wei's decisions are usually spot-on, and if there's an open question like "why did Wei do this" it is because I don't fully appreciate the problem and solution).

denisbider commented 5 years ago

Yeah, I tried to minimize them when possible.

If you want to go that route, "minimize" is not enough. Must eliminate completely.

To put it differently, "minimizing" does nothing and is not an effective solution.

Also see Which is the best practice? Defining strings in C++ files or header files?

It would take an imaginative reading of the linked article to assume string definitions in header files are OK.

First reply: "If those go in the header as is, you'll probably run into ODR problems."

Accepted answer: "A better idea would be to ... place the definitions in one and only one implementation file"

Do you envision problems with a change like this?

For sure. It's a significant reduction in functionality for anyone who uses channels. Dynamic construction of channel names is no longer possible or made much harder, space is limited to 32 bits or 64 bits, etc.

This is not a backward compatible change. It may be acceptable if there's no one actually using the potential of the string channels.

Related, do you know why Wei used a string instead of an enum?

The Crypto++ channel infrastructure also handles SSH channels in Wei's original SSH implementation. SSH channel handles are arbitrary strings.

noloader commented 5 years ago

Thanks @denisbider ,

When I do the root cause analysis of this report and static initialization order fiasco, those strings are always a contributing factor. If we remove the strings from the equation then the problem goes away.

I'd like to use compiler- and platform-specific methods and keep them to minimize impact on users. The problem is, we don't have the remediation for some platforms. Its another incomplete remediation, like minimizing the use of non-trivia global objects.

I get what you are saying about other projects using Crypto++ objects; and it is the other project's responsibility to use them correctly. But at the end of the day this is what I can't get beyond. (1) they are our objects and public symbols, and we are responsible for them. (2) if RTFM was going to work it would have happened in the last 50 years or so. Putting it another way, this is our technological debt and we should be servicing it. We should not be pushing it onto others.

At this point in time I feel like the strings need to go away. We simply cannot make them work with what we have. All I seem to do is move the problem around.


Regarding the significant changes, I agree. I think they are more significant for the library and less significant for user programs.

I tested the library change on my fork at Commit a2524958e5c4, Convert Channels from std::string to enum ChannelId. There were 320 additions and 292 deletions. It is non-trivial.

I think the changes are less significant for users because DEFAULT_CHANNEL and AAD_CHANNEL still work as expected. We also we removed the socket and thread gear at Commit f2171cbe2f49 (also GH #208) so we reduced a lot of surface area with respect to sources and sinks, like sockets and pipes.

I understand the benefits of a std::string to users. They can use a connection's n-tuple as a unique identifier for a vanilla socket; or they can use a client or server's identification string in the case of SSH. The proposed change using an enum means users must manage the mapping of a string to an int instead of the library doing it for them. I don't feel this is a significant barrier since std::map<std::string, int> is readily available.

I was wrong about the channel id space when I said we can use 16-bits and users can use 16-bits. We need to reserve two of them for DEFAULT_CHANNEL and AAD_CHANNEL. The remaining 4 billion or so are free for programs to use.

Our next version release is 8.0 because of breaking the ABI. Changing the channel implementation is just another ABI break that can happen at the time of the 8.0 release. So it is a good time for a break if it is going to happen.


Do you have any concrete recommendations based on requirements? I can enumerate my understanding library requirements if needed, but I think you know them. (A diff or a push to your GitHub would be helpful).

denisbider commented 5 years ago

Sure. As long as the decision is understood, which it seems it is, moving to enums seems like a good solution.

I would suggest reserving at least like the first 100 values for use by the library though.

denisbider commented 5 years ago

One more thing I think should be acknowledged is that you're erring on the side of supporting all of the possible platforms. I would restrict supported platforms to major, widely used platforms that don't create too much trouble and can be supported well. But there are pros and cons.