Open hson2 opened 10 months ago
Thank you for the detailed report. I'll try to reproduce this and fix accordingly. In the meanwhile, I'm wondering why you choose to cancel first and then wait for result here
future.cancel()
future.result(timeout=60)
instead of a common use case as the sample shows. This could be a mitigation for the issue you are experiencing.
Thank you for the detailed report. I'll try to reproduce this and fix accordingly.
In the meanwhile, I'm wondering why you choose to cancel first and then wait for result here
future.cancel() future.result(timeout=60)
instead of a common use case as the sample shows. This could be a mitigation for the issue you are experiencing.
Because I wanted to subscribe messages until certain timestamp (which turns out wrong idea 😅) and make callback to signal event if it subscribed all messages by using threading.Event object.
If I call result first without timeout then it will wait indefinitely and if I set timeout then it will raise TimeoutError at second call of result. I wanted to just cancel it first to make it avoid raising error.
We are also seeing this issue.
Python version: 3.11 pip version: 23.3.1 google-cloud-pubsub version: 2.14.1
We are running in Google Kubernetes Engine (GKE) and listening to the following two signals in our worker for scaling:
import signal
signal.signal(signal.SIGINT, self.exit_gracefully)
signal.signal(signal.SIGTERM, self.exit_gracefully)
With a exit_gracefully
implemented as follows:
def exit_gracefully(self) -> None:
"""Stop the async worker."""
self.future.cancel() # Request shutdown
try:
self.future.result(timeout=60 * 5) # Block until the shutdown is complete (or up to 5min).
except TimeoutError:
self.logger.warning("Stop timeout reached, Pod will die.")
except Exception as e:
self.logger.error(f"Error while shutting down worker: {e}")
with the following setup:
self.subscriber = pubsub_v1.SubscriberClient()
flow_control = pubsub_v1.types.FlowControl(max_messages=self.pub_sub_flow_control_max_messages)
executor = futures.ThreadPoolExecutor(
max_workers=self.pub_sub_thread_pool_executor_max_workers, thread_name_prefix="TPE-PeachWorker"
)
scheduler = pubsub_v1.subscriber.scheduler.ThreadScheduler(executor=executor)
self.future = self.subscriber.subscribe(
self.subscription_path,
callback=self.pubsub_callback,
flow_control=flow_control,
scheduler=scheduler,
await_callbacks_on_shutdown=True,
)
We do our main work in a separate thread from the main thread to allow python to receive signals:
# Execute the wait on the future for streaming pull in a different thread
# because Python's main thread is the only one that will receive signals such as
# SIGTERM for when a pod is requested to die so it can be replaced with a new one.
thread = Thread(target=self.process, args=(), daemon=True)
thread.start()
With the process
method implemented as follows:
def process(self):
with self.subscriber:
try:
self.future.result()
except Cancelled:
self.logger.info("Cancel request received.")
We are seeing the same scheduler assertion error that @hson2 saw:
AssertionError: null
File "threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/site-packages/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py", line 921, in _shutdown
assert self._scheduler is not None
However, we are also seeing another error immediately before:
AttributeError: 'NoneType' object has no attribute 'nack'
File "threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.11/site-packages/google/cloud/pubsub_v1/subscriber/_protocol/streaming_pull_manager.py", line 948, in _shutdown
msg.nack()
@liuyunnnn Could you expand on the preference to not call cancel first in a scenario like this?
Thank you for your help :)!
Environment details
google-cloud-pubsub
version: 2.18.4Steps to reproduce
_on_response
function.Code example
Stack trace
Explanation
It's because
future.cancel()
executesmanager.close()
which makes_scheduler
asNone
and it makes_on_response
raise AssertionError.Maybe it has to be protected by threading lock somehow.