Closed yurishkuro closed 5 years ago
do you have suggestion on the format of /introspection
?
It depends on what validations we want to implement. For example, if all nodes participate by trusting the inbound trace ID, then we may not need to call /introspection
at all, because the final response the /transaction
endpoint from the root node will contain trace IDs observed by every node, and we can simply ensure they are all the same ID.
If we want more advanced validation, then I'd assume the driver to send requests like /instrospection?traceID=xxx
or /introspection?correlationTraceID=xxx
(for distrustful nodes). The output is just a short summary of the trace that is required for validation.
We'd need to brainstorm this part, I don't have a complete answer.
For example, one other type of validation we might want is that causal relationships to the parent span are captured correctly. So the response to the first introspection request above would have to include not only trace/span ID, but also parent ID, so that the driver could validate the unbroken trace. However, this implies that all nodes would be able to answer such question, i.e. all have the ability to record parent span ID. I am not sure if it's possible, but on the other hand it could be just an optional capability that the node exports to the validator.
I think we definitely need this and should work on a first version for the next workshop. I also agree that a lot of the theoretical discussions will go away once there is code that actually implements the spec.
I would first focus on testing key use cases handled by a single provider i.e. forwarding a trace context correctly. If we can get this done we have made a significant step forward.
I not a big fan of the Docker image approach because it is hard to provide a full tracing system in a single docker file. I rather have provide the endpoints and keep the implementation details to the vendor. Obviously, I could just forward a request from the container to the backend.
@yurishkuro would you be open to provide a first system to get our feet wet, so we can discuss next steps at the next workshop?
The Docker image does not need to contain the complete tracing system. We are primarily testing interop of instrumentations, so the image can include the test app, instrumentation, and a simple in-memory backend for storing traces. Because the tests are organized as multi-hop requests through Nodes, trying to orchestrate this with just external API endpoints is much harder. With docker images everything is local, docker-compose takes care of network wiring.
Unfortunately, I don't have a lot of time to allocate to this implementation. Most of the code already exists in the form of Jaeger cross-language integration tests, which can be repurposed to cross-vendor tests.
https://github.com/jaegertracing/jaeger-client-go/tree/master/crossdock
And here's en example of actually defining two types of tests ("behaviors" in crossdock parlance): https://github.com/jaegertracing/jaeger-client-java/blob/master/jaeger-crossdock/docker-compose.yml
I am progressing with the test suite.
Q: the LICENSE file says:
Contributions to Test Suites are made under the W3C 3-clause BSD License
Is this going to be a problem for vendors who want to submit something to run in the official tests? Why can't we use Apache 2 license? Created a separate issue #94.
This is the first cut of the compliance tests:
https://github.com/yurishkuro/distributed-tracing/tree/compliance-tests/tests
@SergeyKanzhelev @AloisReitbauer any thoughts on the prototype?
@yurishkuro looking at it today. Thank you for putting something working together.
Can you please add "getting started" with commands to run?
added to tests/readme
Few more packages needed. Not sure if there are setting for go compiler that can auto-download those:
git clone https://github.com/crossdock/crossdock-go $GOPATH/src/github.com/crossdock/crossdock-go
git clone https://github.com/davecgh/go-spew.git $GOPATH/src/github.com/davecgh/go-spew
git clone https://github.com/golang/net.git $GOPATH/src/golang.org/x/net
ah, sure, I didn't set up a dependency manager yet. You can install these packages via go get
. Let me try to add dep
.
actually, I think you only need to go get github.com/crossdock/crossdock-go
, it's the only dependency so far, and running dep
didn't add much value. Updated readme.
Second iteration of the test suite: https://github.com/yurishkuro/distributed-tracing/tree/compliance-tests/tests
Main changes:
TRUST_TRACE_ID
, TRUST_SAMPLING
, etc. (example in docker-compose.yaml)trace_context_diff_vendor
using the reference implementation running in two modes, (a) trusting and (b) not trusting inbound trace IDs.make crossdock
, sampled output below)At this point it's possible to start adding more specific tests for behaviors, but the reference implementation is currently very naive and not compliant, e.g it doesn't really check tracestate
, instead it fully relies on traceparent
, so it needs to be improved (there are TODOs in the code).
@adriancole would be interesting to test with your Java implementation if you could build an image. We need to re-implement the actor/
module in Java so that it can be similarly used as the main
for the container and only leave the api.Tracer
pluggable.
Executing Matrix...
S [malformed_trace_context] refnode→ (actor=refnode driver=refnode) ⇒ not implemented
S [malformed_trace_context] refnode→ (actor=refnode1 driver=refnode) ⇒ not implemented
S [missing_trace_context] refnode→ (actor=refnode driver=refnode) ⇒ not implemented
S [missing_trace_context] refnode→ (actor=refnode1 driver=refnode) ⇒ not implemented
✓ [trace_context_diff_vendor] refnode→ (actor=refnode driver=refnode) (9/9 passed, 0/9 skipped)
├ ✓ ⇒ same trace ID
├ ✓ ⇒ spanID is not empty
├ ✓ ⇒ ParentSpanID equal root spanID
├ ✓ ⇒ span is sampled
├ ✓ ⇒ same downstream traceID
├ ✓ ⇒ downstream span is sampled
├ ✓ ⇒ modified tracestate
├ ✓ ⇒ non-empty vendor key
└ ✓ ⇒ vendor key 'ref' in the first position
✓ [trace_context_diff_vendor] refnode→ (actor=refnode1 driver=refnode) (10/10 passed, 0/10 skipped)
├ ✓ ⇒ different trace ID
├ ✓ ⇒ trace ID is in correlationID
├ ✓ ⇒ spanID is not empty
├ ✓ ⇒ ParentSpanID equal root spanID
├ ✓ ⇒ span is sampled
├ ✓ ⇒ downstream traceID equal 1st actor's traceID
├ ✓ ⇒ downstream span is sampled
├ ✓ ⇒ modified tracestate
├ ✓ ⇒ non-empty vendor key
└ ✓ ⇒ vendor key 'ref' in the first position
S [trace_context_same_vendor] refnode→ (actor=refnode driver=refnode) ⇒ not implemented
S [trace_context_same_vendor] refnode→ (actor=refnode1 driver=refnode) ⇒ not implemented
19/19 passed (6/25 skipped)
Tests passed!
sorry meant to reply. In a crunch but interested in this.. worst case can look at it during the workshop
During the workshop we discussed that the approach with docker container may not work. Concerns are:
So test harness should be something you can run locally (like a container or easy runnable app) to test private implementations and produce a report. And it should work against an endpoint, not necessarily manage the target docker.
One discussion is whether we need to have live reports of vendors. "Live" compliance test results may be generated as CI on this repo or by vendors uploading results to some central place.
Ending this comment... moving to the discussion of test cases to validate that http endpoint approach will work and we do not need multiple "chained" containers.
Test suites are authored in the notes document for the workshop: https://docs.google.com/document/d/1Zh871qWTew8Rzhz6jhFW0nxeC1ax1kAovQW8RFF5bCA/edit#
One note we will most probably implement them in python as most platform and vendor independent language.
I'm closing this as we have a test suite implemented. Let's open separate items for improving existing test suite if needs be
Objective
Have a standard test suite that can take 2 or more "node", and execute a series of transactions that traverse each system, with the purpose of verifying that they exchanged tracing headers in such a way that they were able to interoperate.
Benefits
By defining the tests and expectations described below we essentially define functional requirements of what we expect to happen in different interop scenarios, something that we're currently missing from the spec and which, imo, causes many round-about discussions of implementations without clear requirements.
Details
Node
Represents a microservice instrumented with a certain tracing library / implementation. Comes packaged as a Docker container that internally runs the tracing backend (or a proxy) and a small app that: a. has a
/transaction
endpoint that executes the test case transaction b. has an/introspection
endpoint used by the test suite driver to verify that the respective tracing backend has captured the traceTransactions
A transaction is described as a recursive set of instructions to call the next Node in the chain or to stop. E.g. it might look like
Running this transaction would execute a chain
zipkin -> jaeger
. When a Node receives such request, it looks for the nestedcallNext
fragment and calls the next Node with that nested (smaller) payload. The last node will receive an empty request so it simply returns.There also can be a convention that each Node's response contains the trace/span ID it observed/generated, again as a recursive structure, e.g.
This would allow the test driver to interrogate the introspection endpoint using those IDs.
Verifications
Test suite driver calls
/introspection
endpoint for each Node to retrieve captured trace(s) in some canonical form (just enough info for the test). If /transaction responses contain trace/span Id, it can do some validation.Test Suite
The test suite is defined as a list of scenarions, e.g.
Each scenario is instantiated multiple times (test cases) by labelling different vendors with roles from the scenario, e.g.
Each test case runs and validates a single transaction, and checks different modes of participation in the trace.
Parameterization
The test suite framework can be also used to test multiple implementations of the tracing library from a given vendor, e.g. in different languages. This can be implemented as either different Node containers (e.g. zipkin_java, zipkin_go), or a single container controlled by env variables.
Participation Modes
The nodes can also support different trace participation modes, at minimum:
If the test driver knows ahead of time which participation mode a given Node supports (these can again be parameters to the Node), it can validate the expected behavior.
Prerequisites
Each vendor must be able to provide a Docker image (or several) to act as a Node in the test suite. Ideally the containers should be fully self-contained, i.e. do not require external connectivity. It's possible to implement them as proxies to hosted tracing backends if necessary, but it will make the tests less reliable if those hosted backends are unavailable at times.
It's crazy / impossible
Jaeger internally uses an approach very similar to this one for many of its integration tests, in particular those that test compatibility of client libraries in different languages. Uber released a framework https://github.com/crossdock/crossdock that helps orchestrating these tests and permutations using docker-compose.