Open dawidcha opened 1 year ago
Further analysis on this issue:
The point in the code where the failure occurs is in the constructor for the 'If' construct in core/lib/promise/if.h. When running the code, this constructor works correctly twice, when dealing with a TrueFactory with an expected NextResult of grpc_core::Message; then the third time when dealing with a TrueFactory NextResult of grpc_metadata_batch, it corrupts the stack frame.
The corruption happens because the optional_data constructor in the second case - optional_data(optional_data&& rhs) noexcept - writes 0x18 bytes of data into a structure that is only 0x10 bytes long (according to sizeof).
The obvious suspicion is that headers are being included into different compilation units with different defines resulting in different perceived object layout assumptions.
I strongly suspected ABSL_USES_STD_OPTIONAL was at fault, but have since confirmed that this is being set consistently across all builds.
I also confirmed that building my test application with Visual Studio 2017 - the same as the grpc libraries results in the same issue.
Is anyone aware of any other defines which might influence the layout and size of objects (and so absolutely need to be consistent across compilation units).
Have you been able to reproduce this issue, and do you need any more information from me?
Full TrueFactory types are:
{f_=grpc_core::promise_detail::Map<grpc_core::InterceptorList<std::unique_ptr<grpc_core::Message, grpc_core::Arena::PooledDeleter>>::RunPromise,grpc_core::NextResult<std::unique_ptr<grpc_core::Message, grpc_core::Arena::PooledDeleter>>
{f_=grpc_core::promise_detail::Map<grpc_core::InterceptorList<std::unique_ptr<grpc_metadata_batch,grpc_core::Arena::PooledDeleter>>::RunPromise,grpc_core::NextResult<std::unique_ptr<grpc_metadata_batch,grpc_core::Arena::PooledDeleter>>
More analysis, and a conclusion:
Turns out that this is (most likely) a (Visual Studio bug)[https://developercommunity.visualstudio.com/t/runtime-stack-corruption-using-stdvisit/346200]. If I compile the test case with Visual Studio 2022, then it runs successfully.
Unfortunately for us, there is no fix for Visual Studio 2017 (or even VS 2019). I'm trying to see if abseil/protobuf/grpc compiled with 2022 work with our code compiled with VS 2017.
What version of gRPC and what language are you using?
What operating system (Linux, Windows,...) and version?
Windows 10
What runtime / compiler are you using (e.g. python version or version of gcc)
gRPC and dependent DLLs compiled with Visual Studio 15 (2017) Test executable used in repro compiled with Visual Studio 17 (2022)
What did you do?
Created a simple C++ program (below) to call a gRPC server using the generic API (i.e. without any code generation). gRPC libraries are linked statically, but all other dependencies are linked dynamically (i.e. they are DLLs). Program is invoked against a gRPC server that has been shown to respond correctly when accessed by other clients.
What did you expect to see?
A successful call to the remote gRPC server returning expected data.
What did you see instead?
Program is able to connect to remote server and successfully initiates a request. However on attempting to retrieve the response, the program experiences a crash apparently caused by stack corruption. This outcome is backed up by a similar case of stack corruption experienced when linking gRPC with proprietary code.
Visual Studio reports the following error: Run-Time Check Failure #2 - Stack around the variable 'true_factory' was corrupted.
Stack Trace was:
This reproduction needs a very simple gRPC server to be running on port 8080 on the local host. Server needs to implement a single RPC entry point (proto definition below), populating the message field of the response with value appropriate to the request 'name' field. It doesn't really matter what since this value is never successfully received by the client code.
Code is compiled using cmake with CMakeLists.txt given below and setting CMAKE_PREFIX_PATH to a path containing prefixes for dependencies: abseil, protobuf, re2, c-ares, zlib and openssl.
Resultant executable is executed with the PATH variable including the bin (dll) directories for the same set of dependencies.
Source Code for repro: sayhello.proto
callsayhello.cpp
CMakeLists.txt
See TROUBLESHOOTING.md for how to diagnose problems better.
Anything else we should know about your project / environment?
We are engaged with Mark Roth and Esun Kim in tracking down the root cause of this issue.