Implement pubsub - Githubissues

codefromthecrypt commented 5 years ago

In https://github.com/openzipkin/zipkin-gcp/issues/45, there was a proposal to add pubsub transport (collector and sender), but that never happened.

@javierviera is currently adding https://github.com/openzipkin/zipkin-go/pull/142 on the golang side, but that's asymmetric and confusing if no server side exists.

In any case, the message format should be standard (ListOfSpans in proto or json)

Implementation wise, sender (java client) should likely use grpc as that's typical. collector should likely use armeria as that's less dependencies and fits into our normal observability tools better (logs metrics tracing) (see stackdriver-storage as an example).

Looking at the api, it seems there's no grpc endpoint for pubsub, but there's a rest api which likely shares similar auth etc. There seems to be a pull api which could be run in a loop similar to our kafka collectors https://cloud.google.com/pubsub/docs/reference/rest/v1/projects.subscriptions/pull

@anuraaga have you done any work in pubsub in curiostack?

elefeint commented 5 years ago

There is a gRPC Pub/Sub API. The subscriber side has streaming and pull methods, with pull being somewhat more predictable (streaming results in a single subscriber hogging a fairly large cache of messages, not allowing other load balanced subscribers to help).

I've worked with Pub/Sub plenty, but I am still getting oriented in Zipkin, and I seem to be behind on everything I've already promised to do. /temporarily backs into the bushes/

anuraaga commented 5 years ago

https://github.com/curioswitch/curiostack/tree/master/common/google-cloud/pubsub

Is an implementation of pubsub using armeria. It still depends on the official client library for the interface so it can be swapped in the transparently but that's easy to remove.

It would still depends on the gRPC stubs though. To remove gRPC dependency, similar to our stackdriver storage, we'd need to add stubless streaming support to armeria-grpc-protocol. Not trivial :)

codefromthecrypt commented 5 years ago

thanks..

So I guess there are multiple ways to do this. once with streams and one without right? I don't know if there are billing implications to either choice. However, there is value to having an implementation to start with that already works in production. Especially since we've already spent a ton of time on infra work lately, it could be a pragmatic call to make. I suspect a good way forward would be to make a feature that doesn't require grpc stubs in the interface we publish (easy to do as there's no reason to).

anuraaga commented 5 years ago

I don't think there are any billing implications. For what it's worth, the official Java client, which exposes it's own API and hides the grpc, uses the streaming API so I suspect it is effectively better tested than the older unary pull.

Agree that the messaging API itself would be agnostic to whether it is served by the official library, or armeria-grpc, or armeria-grpc-protocol.

meltsufin commented 5 years ago

Like @elefeint said, there are some gotchas with the streaming pull, especially if you have more than one consumer. I would suggest reading the "StreamingPull" section in the docs.

codefromthecrypt commented 5 years ago

hmm considering all of our other messaging transports do not have the complexity or cavaeats implied by this. I'm tempted to use the same pull loop pattern everything else usees vs persistent bidi which is more advanced to support and interpret.

On Mon, Aug 12, 2019 at 10:27 PM Mike Eltsufin notifications@github.com wrote:

Like @elefeint said, there are some gotchas with the streaming pull, especially if you have more than one consumer. I would suggest reading the "StreamingPull" section in the docs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

codefromthecrypt commented 11 months ago

Thanks to @3Agkatzioura this is implemented at core level sender and collector side. Next step (help wanted) is to integrate it into the module, so that the server can use it as a collector. You can look at zipkin-aws for advice on how to setup collectors.

openzipkin / zipkin-gcp

Implement pubsub #132