Closed codefromthecrypt closed 8 years ago
cc @chimericalidea @abesto @eirslett
Love all the effort you've put into this. To enumerate the words we're looking for:
Methinks "storage" and "transport" (as in Technology used to get trace data from from instrumented applications to collector = span receiver = transport) is fine. "instrumented application" could use a shorter, clearer, consistent name; to me this is a nice-to-have.
The big one is collector = span receiver = transport. I have a slight inclination to call things what they're called in OpenTracing if there are no other considerations. In this case, "receiver" implies a passive component, which is not the case with Kafka. This is still something I can live with though.
In short, my current favorite is, used in a sentence: reporters send trace data via one of several transports to the Zipkin receiver, which persists trace data into storage. Later storage is queried by the API to provide data to the UI.
Deployers will have to balance the availability benefits vs the complexity of running the receiver, API and UI components in one, two, or three processes / servers / clusters. They're free to choose the transport and storage that best matches their existing infrastructure, after having internalized the performance and resiliency properties to be outlined in a later document.
BTW, just realized: "storage" means two things as well, currently, similarly to transport. It's both
Separating these can clarify things, both for deployers and for code clarity.
-
- Name in OpenTracing: receiver
one clarification.. I don't think opentracing uses this term, as they mostly talk about the instrumentation side. On that side, I've seen Reporter used.
Thanks for the clarification, I missed that.
There's been no activity on this for a week now; I feel like turning this into a specific proposal with the nouns reporter, transport, receiver, storage, database, api, ui.
@adriancole go?
Go!
There's been a naming trouble, evidenced by past terminology discussion about the zipkin-collector-service and words like transport or receiver. It is pretty clear that the term collection is correct from both a role standpoint and also an architectural tier standpoint, describing ingest of trace data.
What's harder is naming components and methods of ingest (especially mapping them sensibly to naming conventions in code and configuration).
The following excerpt from Psaltis, Andew G. Streaming Data. Manning which might help:
Keep this in mind while I mention the names we currently use for associated things. Ex. we call "protocol" "transport" or "receiver". The java code tentatively calls all the things *Transport, while we figure this out.
For http
For kafka
For scribe
Note deployments in practice aren't limited to this.. we've had folks use amazon lambda service!
Anyway, I'm looking for clear terminology that we can use for documentation. For example, an overview like this:
Zipkin architecture includes a collection tier which supports Kafka, Scribe etc. This collection is made up of components that accept encoded spans and eventually persist them to storage. For example, a zipkin-server plays a collection role when KafkaTransport is enabled.
It would be great if terms used above could be corrected or clarified, cleanly mapping to advice for deployers, without being overly prescriptive or hinting at an interaction pattern that might not apply... Laundry list, but worth a shot! Now's a great time to do this, as we are updating documentation and finalizing code.
Any ideas? @apsaltis please help if you can..