zeroc-ice / ice

All-in-one solution for creating networked applications with RPC, pub/sub, server deployment, and more.
https://zeroc.com
GNU General Public License v2.0
2.03k stars 592 forks source link

C++ data race with Communicator::stringToProxy and Communicator::destroy #2585

Open externl opened 1 month ago

externl commented 1 month ago

While porting Swift to use async/await I got this crash

* thread #2, queue = 'com.apple.root.default-qos.cooperative', stop reason = EXC_BAD_ACCESS (code=1, address=0x5f)
  * frame #0: 0x0000000105081c6c TestDriver`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__is_long[abi:ue170006](this="") const at string:1734:33
    frame #1: 0x0000000105081c1c TestDriver`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__get_pointer[abi:ue170006](this="") const at string:1869:17
    frame #2: 0x0000000105081bdc TestDriver`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::data[abi:ue170006](this="") const at string:1559:73
    frame #3: 0x00000001050ca1c4 TestDriver`std::__1::basic_ostream<char, std::__1::char_traits<char>>& std::__1::operator<<[abi:ue170006]<char, std::__1::char_traits<char>, std::__1::allocator<char>>(__os=0x000000016af15020, __str="") at ostream:1093:56
    frame #4: 0x00000001051272c4 TestDriver`IceInternal::EndpointI::initWithOptions(this=0x000060000166c418, args=size=2) at EndpointI.cpp:49:17
    frame #5: 0x0000000105138c54 TestDriver`IceInternal::IPEndpointI::initWithOptions(this=0x000060000166c418, args=size=2, oaEndpoint=false) at IPEndpointI.cpp:378:16
    frame #6: 0x000000010546ed10 TestDriver`IceInternal::TcpEndpointFactory::create(this=0x00006000024743d8, args=size=2, oaEndpoint=false) const at TcpEndpointI.cpp:364:12
    frame #7: 0x000000010512328c TestDriver`IceInternal::EndpointFactoryManager::create(this=0x0000600000474398, str="tcp -p 12010", oaEndpoint=false) const at EndpointFactoryManager.cpp:108:35
    frame #8: 0x00000001053fdeb8 TestDriver`IceInternal::ReferenceFactory::create(this=0x0000600001f74018, str="test:tcp -p 12010", propertyPrefix="") at ReferenceFactory.cpp:469:74
    frame #9: 0x00000001050989e8 TestDriver`Ice::Communicator::_stringToProxy(this=0x0000600003f4c658, s="test:tcp -p 12010") const at Communicator.cpp:84:43
    frame #10: 0x00000001056e37d0 TestDriver`std::__1::optional<Ice::ObjectPrx> Ice::Communicator::stringToProxy<Ice::ObjectPrx, true>(this=0x0000600003f4c658, str="test:tcp -p 12010") const at Communicator.h:95:30
    frame #11: 0x00000001056e35c8 TestDriver`-[ICECommunicator stringToProxy:error:](self=0x0000600002a703a0, _cmd="stringToProxy:error:", str="test:tcp -p 12010", error=0x000000016af16a70) at Communicator.mm:51:39
    frame #12: 0x0000000104fab6bc TestDriver`closure #1 in CommunicatorI.stringToProxyImpl<ProxyImpl>(self=0x0000600000e50d80, str="test:tcp -p 12010") at CommunicatorI.swift:251:46
    frame #13: 0x0000000104fad9c4 TestDriver`partial apply for closure #1 in CommunicatorI.stringToProxyImpl<A>(_:) at <compiler-generated>:0
    frame #14: 0x00000001a2d1c54c libswiftObjectiveC.dylib`ObjectiveC.autoreleasepool<τ_0_0>(invoking: () throws -> τ_0_0) throws -> τ_0_0 + 64
    frame #15: 0x0000000104fab14c TestDriver`CommunicatorI.stringToProxyImpl<ProxyImpl>(str="test:tcp -p 12010", self=0x0000600000e50d80) at CommunicatorI.swift:250:20
    frame #16: 0x0000000104fa6ca8 TestDriver`CommunicatorI.stringToProxy(str="test:tcp -p 12010", self=0x0000600000e50d80) at CommunicatorI.swift:48:13
    frame #17: 0x0000000104fabb64 TestDriver`protocol witness for Communicator.stringToProxy(_:) in conformance CommunicatorI at <compiler-generated>:0
    frame #18: 0x00000001057c0358 TestDriver`closure #1 in allTests(output=0x0000600002860230, comm=0x0000600000e50d80, ref="test:tcp -p 12010") at AllTests.swift:60:36
    frame #19: 0x00000001057c0d44 TestDriver`partial apply for closure #1 in allTests(_:) at <compiler-generated>:0
    frame #20: 0x0000000104fa2214 TestDriver`thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) at <compiler-generated>:0
    frame #21: 0x0000000104fa2370 TestDriver`partial apply for thunk for @escaping @callee_guaranteed @Sendable @async () -> (@out A) at <compiler-generated>:0

The code in question was running stringToProxy and destroy at the same time.

Task {
    try communicator.stringToProxy(ref)
}
communicator.destory()
pepone commented 1 month ago

It is not clear to me that this is a race condition issue. Doesn't Task create a detached task that could run after main exits?

externl commented 1 month ago

That was just a snippet of the test that was failing. It was not exiting main for a while after.

It's reproducible in just C++ .

#include <Ice/Ice.h>

int main(int argc, char *argv[])
{
    int count = 0;
    for (;;)
    {
        Ice::CommunicatorHolder ich(argc, argv);
        auto communicator = ich.communicator();

        std::thread t([communicator]
                      {
                          for (;;)
                          {
                            try {
                                communicator->stringToProxy("test:tcp -h 127.0.0.1 -p 10000");
                            } catch (Ice::CommunicatorDestroyedException &e) {
                                break;
                            }

                          } });
        communicator->destroy();
        t.join();
    }

    return 0;
}
❯ clang++ -std=c++17 test.cpp -lIce -I/Users/joe/Developer/zeroc-ice/ice/cpp/include -I/Users/joe/Developer/zeroc-ice/ice/cpp/include/generated -L/Users/joe/Developer/zeroc-ice/ice/cpp/lib -Wl,-rpath,/Users/joe/Developer/zeroc-ice/ice/cpp/lib  -o test

❯ ./test
fish: Job 1, './test' terminated by signal SIGSEGV (Address boundary error)
externl commented 1 month ago

After running this a bunch there seems to be a variety of failures you can get.

pepone commented 1 month ago

One issue is that endpoint factories set the instance member to nullptr during destroy, but access it without chekcing.

For example:

https://github.com/zeroc-ice/ice/blob/1ae9e22a64a2f8c80bdf5ad280f71064a4000c64/cpp/src/Ice/TcpEndpointI.cpp#L373-L377

and:

https://github.com/zeroc-ice/ice/blob/1ae9e22a64a2f8c80bdf5ad280f71064a4000c64/cpp/src/Ice/TcpEndpointI.cpp#L353-L357

A second issue is the endpoint factory manager clears the factories vector, while it is in use:

For example:

https://github.com/zeroc-ice/ice/blob/1ae9e22a64a2f8c80bdf5ad280f71064a4000c64/cpp/src/Ice/EndpointFactoryManager.cpp#L190-L197

and:

https://github.com/zeroc-ice/ice/blob/1ae9e22a64a2f8c80bdf5ad280f71064a4000c64/cpp/src/Ice/EndpointFactoryManager.cpp#L92-L103

We can fix this one by holding the mutex in destroy.

For "invalid endpoint" it is probably because, the factory was destroyed and removed, before it has a chance to parse the endpoint. We can fix ReferenceFactory to throw CommunicatorDestroyedException in this case.

I also got this crash with the same test:

* thread #2, stop reason = EXC_BAD_ACCESS (code=1, address=0x180)
  * frame #0: 0x00000001023abb00 libIce.38a0.dylib`std::__1::shared_ptr<IceInternal::DefaultsAndOverrides>::get[abi:ue170006](this=0x0000000000000180) const at shared_ptr.h:871:16
    frame #1: 0x000000010239bee0 libIce.38a0.dylib`std::__1::shared_ptr<IceInternal::DefaultsAndOverrides>::operator bool[abi:ue170006](this=0x0000000000000180) const at shared_ptr.h:903:16
    frame #2: 0x000000010239be80 libIce.38a0.dylib`IceInternal::Instance::defaultsAndOverrides(this=0x0000000000000000) const at Instance.cpp:289:5
    frame #3: 0x000000010259234c libIce.38a0.dylib`IceInternal::ProtocolInstance::defaultSourceAddress(this=0x0000600000b14518) const at ProtocolInstance.cpp:103:23
    frame #4: 0x0000000102351264 libIce.38a0.dylib`IceInternal::IPEndpointI::initWithOptions(this=0x0000600001914018, args=size=0, oaEndpoint=false) at IPEndpointI.cpp:408:56

(lldb) frame select 3
frame #3: 0x00000001025922a8 libIce.38a0.dylib`IceInternal::ProtocolInstance::defaultSourceAddress(this=0x0000600001f80218) const at ProtocolInstance.cpp:103:23
   100  const Address&
   101  IceInternal::ProtocolInstance::defaultSourceAddress() const
   102  {
-> 103      return _instance->defaultsAndOverrides()->defaultSourceAddress;
   104  }
   105  
   106  const EncodingVersion&
(lldb) print _instance
(const IceInternal::InstancePtr) nullptr {
  __ptr_ = nullptr