grpc / grpc

The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)
https://grpc.io
Apache License 2.0
41.17k stars 10.43k forks source link

gRPC for Windows (static lib) using generic API fails with stack corruption error #33592

Open dawidcha opened 1 year ago

dawidcha commented 1 year ago

What version of gRPC and what language are you using?

What operating system (Linux, Windows,...) and version?

Windows 10

What runtime / compiler are you using (e.g. python version or version of gcc)

gRPC and dependent DLLs compiled with Visual Studio 15 (2017) Test executable used in repro compiled with Visual Studio 17 (2022)

What did you do?

Created a simple C++ program (below) to call a gRPC server using the generic API (i.e. without any code generation). gRPC libraries are linked statically, but all other dependencies are linked dynamically (i.e. they are DLLs). Program is invoked against a gRPC server that has been shown to respond correctly when accessed by other clients.

What did you expect to see?

A successful call to the remote gRPC server returning expected data.

What did you see instead?

Program is able to connect to remote server and successfully initiates a request. However on attempting to retrieve the response, the program experiences a crash apparently caused by stack corruption. This outcome is backed up by a similar case of stack corruption experienced when linking gRPC with proprietary code.

Visual Studio reports the following error: Run-Time Check Failure #2 - Stack around the variable 'true_factory' was corrupted.

Stack Trace was:

This reproduction needs a very simple gRPC server to be running on port 8080 on the local host. Server needs to implement a single RPC entry point (proto definition below), populating the message field of the response with value appropriate to the request 'name' field. It doesn't really matter what since this value is never successfully received by the client code.

Code is compiled using cmake with CMakeLists.txt given below and setting CMAKE_PREFIX_PATH to a path containing prefixes for dependencies: abseil, protobuf, re2, c-ares, zlib and openssl.

Resultant executable is executed with the PATH variable including the bin (dll) directories for the same set of dependencies.

Source Code for repro: sayhello.proto

syntax = "proto3";

package com.prober.sayhello;

service Hello{
  rpc SayHello(SayHelloRequest) returns(SayHelloReply) {}
}

message SayHelloRequest{
  string name = 1;
}

message SayHelloReply{
  string message = 1;
}

callsayhello.cpp

#define ABSL_CONSUME_DLL
#define PROTOBUF_USE_DLLS
#define _ITERATOR_DEBUG_LEVEL 0

#include <iostream>

#include <google/protobuf/compiler/importer.h> // Importer, SourceTree
#include <google/protobuf/descriptor.h> //google::protobuf::FileDescriptor
#include <google/protobuf/dynamic_message.h> //DynamicMessageFactory
#include <grpcpp/security/credentials.h> // InsecureChannelCredentials
#include <grpcpp/generic/generic_stub.h> // grpc::TemplatedGenericStub
#include <grpcpp/grpcpp.h>
#include <grpcpp/impl/proto_utils.h> // grpc::SerializationTraits

char proto[] = R"(
syntax = "proto3";

package com.prober.sayhello;

service Hello{
  rpc SayHello(SayHelloRequest) returns(SayHelloReply) {}
}

message SayHelloRequest{
  string name = 1;
}

message SayHelloReply{
  string message = 1;
}
)";

class SourceTree : public google::protobuf::compiler::SourceTree
{
public:
    google::protobuf::io::ZeroCopyInputStream* Open(absl::string_view filename) override
    {
        return new google::protobuf::io::ArrayInputStream(proto, sizeof(proto)-1);
    }
};

class SourceErrorCollector : public google::protobuf::compiler::MultiFileErrorCollector {
public:
    std::ostringstream errors;

    // implements ErrorCollector ---------------------------------------
    void AddError(const std::string& filename, int line, int column, const std::string& message) override {
        std::cout << "Error - File: " << filename << " line: " << line + 1 << " column: " << column << " error: " << message << std::endl;
    }

    void AddWarning(const std::string& filename, int line, int column, std::string& message) {
        std::cout << "Warning - File: " << filename << " line: " << line + 1 << " column: " << column << " warning: " << message << std::endl;
    }
};

int main(int argc, char** argv)
{
    bool ok;
    void* tag;

    SourceTree sourceTree;
    SourceErrorCollector srcErrCollector;

    google::protobuf::compiler::SourceTreeDescriptorDatabase descriptorDb(&sourceTree);
    google::protobuf::DescriptorPool sourceDescriptorPool (&descriptorDb, descriptorDb.GetValidationErrorCollector());

    google::protobuf::compiler::Importer importer(&sourceTree, &srcErrCollector);
    const google::protobuf::FileDescriptor &file_descriptor = *importer.Import("blah");

    auto fileDescriptor = sourceDescriptorPool.FindFileByName("blah");  
    google::protobuf::FileDescriptorProto fileDescriptorProto;

    fileDescriptor->CopyTo(&fileDescriptorProto);

    google::protobuf::DynamicMessageFactory messageFactory;
    google::protobuf::DescriptorPool descriptorPool;
    descriptorPool.BuildFile( fileDescriptorProto );

    auto requestDescriptor = descriptorPool.FindMessageTypeByName("com.prober.sayhello.SayHelloRequest");
    auto responseDescriptor = descriptorPool.FindMessageTypeByName("com.prober.sayhello.SayHelloReply");

    const google::protobuf::Message* responseMessageProto = messageFactory.GetPrototype(responseDescriptor);
    const google::protobuf::Message* requestMessageProto = messageFactory.GetPrototype(requestDescriptor);

    google::protobuf::Message* responseMessage = responseMessageProto->New();
    google::protobuf::Message* requestMessage = requestMessageProto->New();

    auto responseReflection = responseMessage->GetReflection();
    auto requestReflection = requestMessage->GetReflection();

    std::shared_ptr<grpc::ChannelInterface> channel = grpc::CreateChannel("localhost:8080", grpc::InsecureChannelCredentials());
    ::grpc::TemplatedGenericStub<::google::protobuf::Message, ::google::protobuf::Message> stub(channel);
    grpc::ClientContext ctx;
    grpc::CompletionQueue cq;
    std::unique_ptr< grpc::ClientAsyncReaderWriter<::google::protobuf::Message, ::google::protobuf::Message>> asyncRdrWrtr
        = stub.PrepareCall(&ctx, "/com.prober.sayhello.Hello/SayHello", &cq);
    asyncRdrWrtr->StartCall((void*)1);
    if (!cq.Next(&tag, &ok) || !ok || tag != (void*)1)
    {
        std::cout << "Failed start call" << std::endl;
        return 1;
    }
    asyncRdrWrtr->Read(responseMessage, (void*)2);

    grpc::WriteOptions options;
    options.set_last_message();
    requestReflection->SetString(requestMessage, requestDescriptor->FindFieldByNumber(1), "Number six");

    asyncRdrWrtr->Write(*requestMessage, options, (void*)3);

    if (!cq.Next(&tag, &ok) || !ok || tag != (void*)3)
    {
        std::cout << "Failed write call" << std::endl;
        return 1;
    }

    if (!cq.Next(&tag, &ok) || !ok || tag != (void*)2)
    {
        std::cout << "Failed read operation" << std::endl;
        return 1;
    }

    std::cout << "response is '" << responseReflection->GetString(*responseMessage, responseDescriptor->FindFieldByNumber(1)) << std::endl;
}

CMakeLists.txt

cmake_minimum_required(VERSION 3.13)
project(test_grpc)
set(CMAKE_CXX_STANDARD 14)

include(CMakeFindDependencyMacro)

# Add main.cpp file of project root directory as source file
set(SOURCE_FILES callsayhello.cpp)

find_package(gRPC)

set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)

# Add executable target with source files listed in SOURCE_FILES variable
add_executable(test_grpc ${SOURCE_FILES})

get_cmake_property(_variableNames VARIABLES)
list (SORT _variableNames)
foreach (_variableName ${_variableNames})
    message(STATUS "${_variableName}=${${_variableName}}")
endforeach()

target_include_directories(test_grpc
    PRIVATE "${gRPC_DIR}/../../../include"
    PRIVATE "${Protobuf_DIR}/../include"
    PRIVATE "${absl_DIR}/../../../include"
)

target_link_directories(test_grpc
    PRIVATE "${gRPC_DIR}/../.."
    PRIVATE "${Protobuf_DIR}/../lib"
    PRIVATE "${absl_DIR}/../.."
    PRIVATE "${re2_DIR}/../.."
    PRIVATE "${c-ares_DIR}/../.."
)

if("${CMAKE_BUILD_TYPE}" STREQUAL "Debug")
    set(Protobuf_LIBRARIES
        libprotobufd
        libprotocd
        libprotobuf-lited
    )
else()
    set(Protobuf_LIBRARIES
        libprotobuf
        libprotoc
        libprotobuf-lite
    )
endif()

target_link_libraries(test_grpc
    abseil_dll
    absl_cord_internal
    absl_cordz_functions
    absl_cordz_handle
    absl_cordz_info
    absl_cordz_sample_token
    absl_crc_cord_state
    absl_flags
    absl_flags_commandlineflag
    absl_flags_commandlineflag_internal
    absl_flags_config
    absl_flags_internal
    absl_flags_marshalling
    absl_flags_parse
    absl_flags_private_handle_accessor
    absl_flags_program_name
    absl_flags_reflection
    absl_flags_usage
    absl_flags_usage_internal
    absl_log_flags
    absl_low_level_hash
    absl_random_internal_distribution_test_util
    absl_statusor
    absl_strerror
    grpc
    gpr
    upb
    grpc++
    ${Protobuf_LIBRARIES}
    ws2_32
    re2
    cares
    address_sorting
    ${OPENSSL_CRYPTO_LIBRARY}
    ${OPENSSL_SSL_LIBRARY}
    ${ZLIB_LIBRARY}
)

install(
    TARGETS test_grpc
)

See TROUBLESHOOTING.md for how to diagnose problems better.

Anything else we should know about your project / environment?

We are engaged with Mark Roth and Esun Kim in tracking down the root cause of this issue.

dawidcha commented 11 months ago

Further analysis on this issue:

The point in the code where the failure occurs is in the constructor for the 'If' construct in core/lib/promise/if.h. When running the code, this constructor works correctly twice, when dealing with a TrueFactory with an expected NextResult of grpc_core::Message; then the third time when dealing with a TrueFactory NextResult of grpc_metadata_batch, it corrupts the stack frame.

The corruption happens because the optional_data constructor in the second case - optional_data(optional_data&& rhs) noexcept - writes 0x18 bytes of data into a structure that is only 0x10 bytes long (according to sizeof).

The obvious suspicion is that headers are being included into different compilation units with different defines resulting in different perceived object layout assumptions.

I strongly suspected ABSL_USES_STD_OPTIONAL was at fault, but have since confirmed that this is being set consistently across all builds.

I also confirmed that building my test application with Visual Studio 2017 - the same as the grpc libraries results in the same issue.

Is anyone aware of any other defines which might influence the layout and size of objects (and so absolutely need to be consistent across compilation units).

Have you been able to reproduce this issue, and do you need any more information from me?

Full TrueFactory types are:

{f_=grpc_core::promise_detail::Map<grpc_core::InterceptorList<std::unique_ptr<grpc_core::Message, grpc_core::Arena::PooledDeleter>>::RunPromise,grpc_core::NextResult<std::unique_ptr<grpc_core::Message, grpc_core::Arena::PooledDeleter>> (absl::lts_20230125::optional<std::unique_ptr<grpc_core::Message, grpc_core::Arena::PooledDeleter>>)> (void){...} } grpc_core::promise_detail::OncePromiseFactory<void,grpc_core::promise_detail::Map<grpc_core::InterceptorList<std::unique_ptr<grpc_core::Message, grpc_core::Arena::PooledDeleter>>::RunPromise,> (void)>

{f_=grpc_core::promise_detail::Map<grpc_core::InterceptorList<std::unique_ptr<grpc_metadata_batch,grpc_core::Arena::PooledDeleter>>::RunPromise,grpc_core::NextResult<std::unique_ptr<grpc_metadata_batch,grpc_core::Arena::PooledDeleter>> (absl::lts_20230125::optional<std::unique_ptr<grpc_metadata_batch,grpc_core::Arena::PooledDeleter>>)> (void){...} } grpc_core::promise_detail::OncePromiseFactory<void,grpc_core::promise_detail::Map<grpc_core::InterceptorList<std::unique_ptr<grpc_metadata_batch,grpc_core::Arena::PooledDeleter>>::RunPromise,> (void)>

dawidcha commented 11 months ago

More analysis, and a conclusion:

Turns out that this is (most likely) a (Visual Studio bug)[https://developercommunity.visualstudio.com/t/runtime-stack-corruption-using-stdvisit/346200]. If I compile the test case with Visual Studio 2022, then it runs successfully.

Unfortunately for us, there is no fix for Visual Studio 2017 (or even VS 2019). I'm trying to see if abseil/protobuf/grpc compiled with 2022 work with our code compiled with VS 2017.