santazhang / simple-rpc

Simple RPC in C++, with Python binding
http://www.yzhang.net/simple-rpc/
BSD 3-Clause "New" or "Revised" License
27 stars 6 forks source link

Data race in server.cc:handle_write #3

Closed rjpower closed 10 years ago

rjpower commented 11 years ago

I'm seeing consistent deadlock issues which seem to be the result of a data-race between ServerConnection::handle_write and ::begin_reply. If I comment out this unlock/lock pair:

    Marshal::read_barrier barrier = out_.get_read_barrier();
//    out_l_.unlock();

    out_.write_to_fd(socket_, barrier, rate);

//    out_l_.lock();

Everything works perfectly.

Helgrind reports this:

==13723== Possible data race during read of size 8 at 0xA3627F8 by thread #12
==13723== Locks held: none
==13723==    at 0x6CA10C5: rpc::FastMarshal::chunk::write_to_fd(int, unsigned long) (marshal.h:170)
==13723==    by 0x6CA0108: rpc::FastMarshal::write_to_fd(int, rpc::FastMarshal::read_barrier const&, rpc::io_ratelimit const&) (marshal.cc:297)
==13723==    by 0x6CA66A7: rpc::ServerConnection::handle_write(rpc::io_ratelimit const&) (server.cc:122)
==13723==    by 0x6CA1912: rpc::PollMgr::PollThread::poll_loop() (polling.cc:152)
==13723==    by 0x6CA2529: rpc::PollMgr::PollThread::start_poll_loop(void*) (polling.cc:45)
==13723==    by 0x4C2EF9C: mythread_wrapper (hg_intercepts.c:219)
==13723==    by 0x4E3DF8D: start_thread (pthread_create.c:311)
==13723==    by 0x5A6FE1C: clone (clone.S:113)
==13723== 
==13723== This conflicts with a previous write of size 8 by thread #14
==13723== Locks held: 1, at address 0xA331D88
==13723==    at 0x6CA09D0: rpc::FastMarshal::chunk::set_bookmark() (marshal.h:91)
==13723==    by 0x6CA031A: rpc::FastMarshal::set_bookmark(unsigned long) (marshal.cc:333)
==13723==    by 0x6CA6113: rpc::ServerConnection::begin_reply(rpc::Request*, int) (server.cc:25)
==13723==    by 0x6A3184F: spartan::WorkerService::__run_kernel__wrapper__(rpc::Request*, rpc::ServerConnection*)::{lambda()#1}::operator()() const (spartan_service.h:438)
==13723==    by 0x6A355F9: std::_Function_handler<void (), spartan::WorkerService::__run_kernel__wrapper__(rpc::Request*, rpc::ServerConnection*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (functional:1925)
==13723==    by 0x6CACA4B: std::function<void ()>::operator()() const (functional:2310)
==13723==    by 0x6CAB957: rpc::ThreadPool::run_thread(int) (utils.cc:70)
==13723==    by 0x6CAB3DF: rpc::ThreadPool::start_thread_pool(void*) (utils.cc:25)
==13723== 
==13723== Address 0xA3627F8 is 24 bytes inside a block of size 48 alloc'd
==13723==    at 0x4C2C2D1: operator new(unsigned long) (vg_replace_malloc.c:298)
==13723==    by 0x6CA0288: rpc::FastMarshal::set_bookmark(unsigned long) (marshal.cc:327)
==13723==    by 0x6CA6113: rpc::ServerConnection::begin_reply(rpc::Request*, int) (server.cc:25)
==13723==    by 0x6A31183: spartan::WorkerService::__initialize__wrapper__(rpc::Request*, rpc::ServerConnection*) (spartan_service.h:390)
==13723==    by 0x6A42CBC: void rpc::Server::reg<spartan::WorkerService>(int, spartan::WorkerService*, void (spartan::WorkerService::*)(rpc::Request*, rpc::ServerConnection*))::H::handle(rpc::Request*, rpc::ServerConnection*) (server.h:201)
==13723==    by 0x6CA6543: rpc::ServerConnection::handle_read() (server.cc:103)
==13723==    by 0x6CA18B8: rpc::PollMgr::PollThread::poll_loop() (polling.cc:149)
==13723==    by 0x6CA2529: rpc::PollMgr::PollThread::start_poll_loop(void*) (polling.cc:45)
==13723==    by 0x4C2EF9C: mythread_wrapper (hg_intercepts.c:219)
==13723==    by 0x4E3DF8D: start_thread (pthread_create.c:311)
==13723==    by 0x5A6FE1C: clone (clone.S:113)
santazhang commented 11 years ago

Thanks Russell. I just saw your issue report because Github doesn't send me notifications...

I've located the problem and will fix it quickly. Thanks! :D