openzipkin / openzipkin.github.io

content for https://zipkin.io
https://zipkin.io
Apache License 2.0
39 stars 64 forks source link

On vocab and collection tier #18

Closed codefromthecrypt closed 8 years ago

codefromthecrypt commented 8 years ago

There's been a naming trouble, evidenced by past terminology discussion about the zipkin-collector-service and words like transport or receiver. It is pretty clear that the term collection is correct from both a role standpoint and also an architectural tier standpoint, describing ingest of trace data.

What's harder is naming components and methods of ingest (especially mapping them sensibly to naming conventions in code and configuration).

The following excerpt from Psaltis, Andew G. Streaming Data. Manning which might help:

Regardless of the protocol used by a client to send data to our collection tier or in certain cases our collection tier reaching out and pulling in the data, there are a limited number of interaction patterns in use today. Even considering the protocols driving the emergence of the Internet of Everything the interaction patterns fall into one of the following categories:

  • Request/Response
  • Publish/Subscribe
  • One-Way
  • Request /Acknowledge
  • Stream

Keep this in mind while I mention the names we currently use for associated things. Ex. we call "protocol" "transport" or "receiver". The java code tentatively calls all the things *Transport, while we figure this out.

For http

For kafka

For scribe

Note deployments in practice aren't limited to this.. we've had folks use amazon lambda service!

Anyway, I'm looking for clear terminology that we can use for documentation. For example, an overview like this:

Zipkin architecture includes a collection tier which supports Kafka, Scribe etc. This collection is made up of components that accept encoded spans and eventually persist them to storage. For example, a zipkin-server plays a collection role when KafkaTransport is enabled.

It would be great if terms used above could be corrected or clarified, cleanly mapping to advice for deployers, without being overly prescriptive or hinting at an interaction pattern that might not apply... Laundry list, but worth a shot! Now's a great time to do this, as we are updating documentation and finalizing code.

Any ideas? @apsaltis please help if you can..

codefromthecrypt commented 8 years ago

cc @chimericalidea @abesto @eirslett

abesto commented 8 years ago

Love all the effort you've put into this. To enumerate the words we're looking for:

Methinks "storage" and "transport" (as in Technology used to get trace data from from instrumented applications to collector = span receiver = transport) is fine. "instrumented application" could use a shorter, clearer, consistent name; to me this is a nice-to-have.

The big one is collector = span receiver = transport. I have a slight inclination to call things what they're called in OpenTracing if there are no other considerations. In this case, "receiver" implies a passive component, which is not the case with Kafka. This is still something I can live with though.

In short, my current favorite is, used in a sentence: reporters send trace data via one of several transports to the Zipkin receiver, which persists trace data into storage. Later storage is queried by the API to provide data to the UI.

Deployers will have to balance the availability benefits vs the complexity of running the receiver, API and UI components in one, two, or three processes / servers / clusters. They're free to choose the transport and storage that best matches their existing infrastructure, after having internalized the performance and resiliency properties to be outlined in a later document.

abesto commented 8 years ago

BTW, just realized: "storage" means two things as well, currently, similarly to transport. It's both

Separating these can clarify things, both for deployers and for code clarity.

codefromthecrypt commented 8 years ago

-

  • Name in OpenTracing: receiver

one clarification.. I don't think opentracing uses this term, as they mostly talk about the instrumentation side. On that side, I've seen Reporter used.

abesto commented 8 years ago

Thanks for the clarification, I missed that.

There's been no activity on this for a week now; I feel like turning this into a specific proposal with the nouns reporter, transport, receiver, storage, database, api, ui.

@adriancole go?

codefromthecrypt commented 8 years ago

Go!