grpc-ecosystem / grpc-opentracing

OpenTracing is a set of consistent, expressive, vendor-neutral APIs for distributed tracing and context propagation
BSD 3-Clause "New" or "Revised" License
472 stars 98 forks source link

concurrent map read and map write on grpc.MD #7

Closed zirkome closed 7 years ago

zirkome commented 7 years ago

Hi,

I didn't know where to file that probable bug but as it's related to gRPC I think that's a good place to start.

So I'm using this repo to bind Zipkin to gRPC but recently I ran into a concurrency issue involving Inject and grpc.MD.

I'm doing two concurrent calls to 2 different gRPC service (also using this binding) but when ran multiple times this service crash with the following stack trace: https://gist.github.com/kokaz/1ff9a2e301abb34276dc002cf82009ae

Here is the repo containing the code: https://github.com/kokaz/zipkin-grpc-demo

After some investigation we might have found a solution: before creating metadataReaderWriter in both client and server interceptor, we reassign md with a copy of the one get from the Context:

md, ok := metadata.FromContext(ctx)
if !ok {
    md = metadata.New(nil)
}
md = md.Copy() // this solve the problem
mdWriter := metadataReaderWriter{md}

cc @bensigelman

bhs commented 7 years ago

@kokaz agreed that this is ultimately a GRPC issue in that the MD type is not concurrency-safe and yet that's not documented anywhere :-/ (I just read the source).

Your solution probably shortens the critical path for the race condition, but there's still no reader lock guarding the Copy call and thus it's not a true fix. Do you want to raise this with GRPC proper? I don't see a truly safe fix without changes to the MD code.

iamqizhao commented 7 years ago

I do not read into your code. Is it possible to have a mutex to guard your MD accesses? grpc MD is not concurrency-safe.

Regarding the document, I think a function is concurrency-safe iff it documents that explicitly. By default, it is not concurency-safe. For example, https://golang.org/src/container/list/list.go?s=922:955#L21.

iamqizhao commented 7 years ago

To elaborate, i) md.Copy() is required anyways since the write op of metadataReaderWriter will race with grpc-internal read op otherwise; ii) if you want to put md a ReaderWriter, you have to use a mutex in read/write implementation -- it is just like you use a map as a ReaderWriter.

bhs commented 7 years ago

@iamqizhao

I do not read into your code. Is it possible to have a mutex to guard your MD accesses? grpc MD is not concurrency-safe.

Since the GRPC MD is shared across all users of the GRPC connection, adding locking in this code (or any other) is not sufficient to make it concurrency-safe. We would need all readers and writers to share concurrency primitives. This is why I would argue that the MD struct should be made thread-safe intrinsically... otherwise how can one confidently write code that either reads or writes from that structure?

zirkome commented 7 years ago

@bensigelman The issue is that MD is not a struct but a map so anyone using MD can end up using it non thread-safely and so we can't really make MD intrinsically thread safe except for the method it has (i.e. Copy, Join).