openzipkin / zipkin

Zipkin is a distributed tracing system
https://zipkin.io/
Apache License 2.0
17.02k stars 3.09k forks source link

Zipkin 3 wishlist: the server side #2759

Open codefromthecrypt opened 5 years ago

codefromthecrypt commented 5 years ago

Zipkin 2 is mostly a brand used about data format, our simpler design and formalization of libraries and such, many of which actually happened in Zipkin 1.x :P

This should contain features, concerns, issues we want to address in a zipkin server 3. This should be decoupled from data formats as otherwise could result in quite a big bang possibly no one will complete.

Major Features/concerns to address

Each checkbox should have issue references at the end of the line or a wiki if big.

I'm marking this "the server side" because many concerns are not core-library in nature, or dependent, unless we are talking about things like our in-memory storage. It is fine to mention things that affect the core library used by instrumentation, but please limit these topics to things that impact the server. We can open a similar issue in brave for instrumentation issues that put pressure on the core library

Compatibility and transition is important. We have people extending the api, and already we get some groaning about things changing here. In other words, we need to be prepared for a significant amount of helping others move to v3 if there's a compatibility or otherwise break. One approach to handling this could be a "degraded mode" bridge where less efficient components take a slow path, but don't imply immediate rewrite. https://zipkin.io/pages/extensions_choices.html

There's no harm in making a v3 feature branch, btw, except that if someone does, probably need to also sign up to perpetual rebase. Another option could be a playground repo in contrib which could save from the rebasing. Some remember we did same with the server port in the first place (used to be called zipkin-java). But remember we're unlikely to retain history if we do too much in a separate repo.

Regardless of approach, I think having the design goals captured here and the design captured in a wiki will be helpful as many people haven't enough time to keep up. Looking at a bullet list and wiki history is a lot easier than following code or threads, and allows a quicker way to dive in.

cc @openzipkin/core @openzipkin/armeria And marking @anuraaga as the lead of this as he seems to have concrete thoughts.

jcarres-mdsol commented 5 years ago

I think you have been quite good at bringing features without breaking changes. Do you really want a V3?

codefromthecrypt commented 5 years ago

@jcarres-mdsol haha this is a parking lot at the moment. As long as zipkin keeps going, we can eventually expect a v3.

jeqo commented 5 years ago

One idea that comes from the development of https://github.com/jeqo/zipkin-storage-kafka and hope it fits under this issue:

The storage layer is based on Kafka Streams local store, that is aligned with partitioning. Currently we have specified that our implementation supports running only a standalone instance for storage, because if we scale the Zipkin instances, storage will get partitioned between servers.

In order to cope with this scenario I'd like to propose a scatter-gather support that allows storage layer to query other instances to build a response.

Example scenario: Given a partitioned back-end with 3 zipkin servers (a,b,c) running as a cluster, if we receive a query from client-side, zipkin-a receive the request, and forward the same query to zipkin-b and zipkin-c with an additional query param (e.g. peer=true) so b and c don't propagate the query. zipkin-a receives responses and build response.

Kafka Streams already supports a metadata API to register peers URLs.

codefromthecrypt commented 5 years ago

Hi, Jorge. Is this actually a zipkin 3 issue? or could it be implemented with the existing apis just possibly some other config? Want to parition (pun intended) things that can't be done without api changes..

On Wed, Aug 28, 2019 at 12:16 AM Jorge Quilcate Otoya < notifications@github.com> wrote:

One idea that comes from the development of https://github.com/jeqo/zipkin-storage-kafka and hope it fits under this issue:

The storage layer is based on Kafka Streams local store, that is aligned with partitioning. Currently we have specified that our implementation supports running only a standalone instance for storage, because if we scale the Zipkin instances, storage will get partitioned between servers.

In order to cope with this scenario I'd like to propose a scatter-gather support that allows storage layer to query other instances to build a response.

Example scenario: Given a partitioned back-end with 3 zipkin servers (a,b, c) running as a cluster, if we receive a query from client-side, zipkin-a receive the request, and forward the same query to zipkin-b and zipkin-c with an additional query param (e.g. peer=true) so b and c don't propagate the query. zipkin-a receives responses and build response.

Kafka Streams already supports a metadata API to register peers URLs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openzipkin/zipkin/issues/2759?email_source=notifications&email_token=AAAPVVYZMIFEV3QN7QFOXTDQGVHPHA5CNFSM4IMQZXPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5IJPFA#issuecomment-525375380, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAPVV7EFKJ65OH22ENAIN3QGVHPHANCNFSM4IMQZXPA .

jeqo commented 5 years ago

Probably not, but got motivated by the issue tho :)

I'll create it as an issue so we discuss if make sense.

jeqo commented 5 years ago

follow up #2784