googleapis / google-cloud-python

Google Cloud Client Library for Python
https://googleapis.github.io/google-cloud-python/
Apache License 2.0
4.8k stars 1.52k forks source link

[pubsub] subscription.pull(return_immediately=True, max_messages=10) is slow #3053

Closed ensonic closed 7 years ago

ensonic commented 7 years ago

os: ubuntu:xenial (running on GKE) python 3.5.2 library versions: google-auth==0.6.0 google-auth-httplib2==0.0.2 google-cloud-core==0.22.1 google-cloud-pubsub==0.22.0 google-gax==0.15.6 googleapis-common-protos==1.5.2 grpc-google-cloud-pubsub-v1==0.14.0 grpc-google-iam-v1==0.11.1 grpcio==1.1.0

I wrapped code like this:

_ts = timeit.default_timer()
topics = self.pubsub_client.list_topics()
logging.info('list_topics: %f', timeit.default_timer() - _ts)

_ts = timeit.default_timer()
results = subscription.pull(return_immediately=True, max_messages=10)
logging.info('pull(len=%d): %f', len(results), timeit.default_timer() - _ts)

_ts = timeit.default_timer()
subscription.acknowledge(ack_ids)
logging.info('acknowledge: %f', timeit.default_timer() - _ts)

The running code produces a log like this:

[2017-02-23 12:27:01,706] INFO:root:pull(len=1): 0.148471
[2017-02-23 12:27:01,727] INFO:root:acknowledge: 0.020419
[2017-02-23 12:27:03,549] INFO:root:pull(len=0): 1.499838
[2017-02-23 12:27:03,550] INFO:root:list_topics: 0.000116
[2017-02-23 12:27:05,600] INFO:root:pull(len=0): 1.513390
[2017-02-23 12:27:07,118] INFO:root:pull(len=0): 1.196649
[2017-02-23 12:27:07,119] INFO:root:list_topics: 0.000116
[2017-02-23 12:27:08,682] INFO:root:pull(len=0): 1.022132
[2017-02-23 12:27:10,192] INFO:root:pull(len=0): 1.194841
[2017-02-23 12:27:10,193] INFO:root:list_topics: 0.000116
[2017-02-23 12:27:11,595] INFO:root:pull(len=1): 0.889177
[2017-02-23 12:27:11,612] INFO:root:acknowledge: 0.016212

What stands out for me is that pull is quite slow, especially when there are no messages (len=0). Is this a bug or a design limitation? The code is a process that pulls new messages on a set of topics and processes them. The processing is super light (and runs in a different thread off a Queue). Would you recommend to change the subscription to be push based as demo'ed here: https://cloud.google.com/pubsub/docs/subscriber#receive_push? Where to get the PUBSUB_VERIFICATION_TOKEN from is not clear in this example though, I am not on appengine.

tseaver commented 7 years ago

@ensonic I'm tagging this with the back-end tag because we don't have any visibility to server-side latency issues here: this repository is just the client wrapper.

@tmatsuo Can you suggest anything?

lukesneeringer commented 7 years ago

We are planning this in the near future. At the moment, we are waiting on the API support to be finalized.

ensonic commented 7 years ago

@lukesneeringer what are you planning in the near future? Any comment on why the pull invocations are slow, especially when there are no messages?

tseaver commented 7 years ago

/cc @tswast

lukesneeringer commented 7 years ago

Soon to be obselete because #3637. @ensonic Sorry, I should have answered your question when you asked it in March. Erm...how embarrassing. But, I basically rewrote the library based on requests from the Pub/Sub team.