slashmo / gsoc-swift-tracing

WIP collection of Swift libraries enabling Tracing on Swift (Server) systems.
https://summerofcode.withgoogle.com/projects/#6092707967008768
Apache License 2.0
21 stars 1 forks source link

Correct description of Span context, describe context propagation #126

Open pokryfka opened 4 years ago

pokryfka commented 4 years ago

I think the description is incorrect or I misunderstood it (in which case I'd like to fix it on my side).

public protocol Span {
    /// The read-only `BaggageContext` of this `Span`, set when starting this `Span`.
    var context: BaggageContext { get }

// ...
}

The way I understand it, the value of span.contex is not the same as the context passed when creating (starting) the span. It should propagate whatever was in the context BUT update the trace context (parent id).


Related question (separate issue?) is if NoOpSpan should propagate context. On my side NoOpSegment does not record anything but pass along the context, NoOpSpan returns empty baggage:

public struct NoOpSpan: Span {
      public var context: BaggageContext {
            .init()
      }
// ...
}

See also

ktoso commented 4 years ago

The way I understand it, the value of span.contex is not the same as the context passed when creating (starting) the span.

Correct, because usually start span means "accept some context, extract current span id, make that the parent id, and make a new id for this span", so it is not the exact same context, but it is set during starting of the span. In that sense the doc is accurate, but perhaps could use some more docs around this?

It should propagate whatever was in the context BUT update the trace context (parent id).

Yes that's right

ktoso commented 4 years ago

Related question (separate issue?) is if NoOpSpan should propagate context. On my side NoOpSegment does not record anything but pass along the context,

I think this comes down to when you'd actually return a NoOpSpan?

I think no-ops are only returned if tracing is "completely disabled" generally speaking.

In other situations, you do need to carry context and create spans, although the context may contain "we're not recording" information.

In zipkin this is marked using a sampled = deny for example. Not propagating the spans/context with the deny decision could cause a tracer to do the following:

// good:
context // "don't record" decision
startSpan(..., context: context) { ... } // good, we're not recording, not need to store/serialize/emit anything
// bad
context = NoOpSpan().context 
// there was a context somewhere, maybe it decided to record or not record already, 
// but we dropped that information... that means the following start span:

startSpan(..., context: context) { ... } // oh oh! Depending on how a tracer is implemented this may actually start recording)
// not good, we always have to carry context even if not recording, in order for that not recording decision to be "
// "persistent" across the entire execution. otherwise each startSpan() is a chance that the tracer will decide "let's trace!" 

This is e.g. how zipkin would work, there's 3 modes:

In general this touches upon "Samplers" which I don't know if you've implemented yet in xRay?

ktoso commented 4 years ago

So the short version on NoOps I think is: only use them if tracing is completely disabled (i.e. the tracing instrument returned is also the NoOpTracingInstrument) since there is no tracer which would randomly decide to start tracing anyway then.

pokryfka commented 4 years ago

@ktoso thank you for the clarifications

I want to propagate context in my implementation (and tests) exactly the same way, its going to be hell of a mess otherwise if different tracer implementations interpret and implement it differently

I think its all clear to me at the moment, although I still think it would be great if the description and some documentation was updated


one more use case (is it ok to post it here?) of how to propagate context based on Lambda PoC

it relates to a number of issues/conversations:

I think it would be good to sync on that and have a swift-tracing recommended way of handling that.

func handle(context: Lambda.Context, event: ByteBuffer) {
  // create operation span/segment using tracer provided in the lamdba context
  // trace id and and parent determined by the trace context in the baggage
  let segment = context.tracer.beginSegment(name: "HandleEvent", baggage: context.baggage)

  // subsegment with sync decoding operation
  let decodedEvent = segment.subsegment(name: "DecodeIn") { _ in
      self.decodeIn(buffer: event)
  }
  // could be also created with
  let decodedEvent2 = context.tracer.segment(name: "HandleEvent2", baggage: segment.baggage) { _ in
      self.decodeIn(buffer: event)
  }

  // or just
  let decodeSubsegment =  context.tracer.segment(name: "HandleEvent2", baggage: segment.baggage) 
  let decodedEvent3 = self.decodeIn(buffer: event)
  // plus error handling which is already covered using closures API
  decodeSubsegment.end()

  // so far so good

  // note that `segment.baggage` will contain updated trace context
  // BUT the lambda context remains not affected and need to be updated manually which is inconvinient and error prone
  // on top of that `Lambda.Context` is a class, if we pass it to functions they would update `context.baggage` it in undeterministic order
  let subsegment = segment.beginSubsegment(name: "HandleIn")
  context.baggage = subsegment.baggage // <-- this is inconvinient and error prone
  // returns a future
  return self.handle(context: context, event: `in`) // <-- `BaggageContext` baggage is passed in refrence type `Lambda.Context`
      .endSegment(subsegment)// NIO convenience
      .endSegment(segment) // NIO convenience
}

note that Lambda.Context is generally const during a single invocation handling, however BaggageContext baggage and logger need to be mutable

also passing mutable values in reference type Lambda.Context is a bad idea in general because asynchronously executed operations (think s3 client copying 100 files on a few threads) will be updating it in undetermined order

proposed solution:

I have not yet got familiar with gsoc-swift-baggage-context v0.2.0 and its Carrier protocols which may provide soem solutions to that

ktoso commented 4 years ago

I think its all clear to me at the moment, although I still think it would be great if the description and some documentation was updated

Agreed, I'll ticketify and do some proper writeups how all this should work together šŸ‘


one more use case (is it ok to post it here?) of how to propagate context based on Lambda PoC

I'll dig into this soon; I think our Carrier types currently require get/set and that may be tricky in reality... perhaps they indeed should only require get; the assumtpion was that a reference type container would make those "safe" (lock wrapping...) but perhaps that assumption is too strong; one can easily make a new context after all "from the base one" šŸ¤”

PS: I wanted to thank you a lot for proving/trying the APIs in real use cases, that's where all those silly small assumptions we made break down. Thanks for uncovering those. šŸ™