googleapis / python-logging

Apache License 2.0
124 stars 53 forks source link

loop immediately in the background_thread, without sleeping #892

Open wguanicedew opened 4 months ago

wguanicedew commented 4 months ago

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

Steps to reproduce

  1. ?
  2. ?

Code example

This part of code is problemtic https://github.com/googleapis/python-logging/blob/main/google/cloud/logging_v2/handlers/transports/background_thread.py#L129-L148.

The bckgroup_thread retry depends on the api_core/retry module. However, when the this retry function is called (https://github.com/googleapis/python-api-core/blob/main/google/api_core/retry/retry_unary.py#L153), it will raise exception immediately. So that the retry module doesn't sleep at all. At the same time, the backgroup_thread doesn't sleep either. It caused a heavy loop immediately, which seems it's attacking the logging server.

# example

Stack trace

pilot_1: 2024-05-07 21:25:32,032 | ERROR    | google.cloud.logging_v2.handlers.transports.background_thread | _safely_commit_batch      | Failed to submit 10 logs.
pilot_1: Traceback (most recent call last):
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/cloud/logging_v2/handlers/transports/background_thread.py", line 115, in _safely_commit_batch
pilot_1:     batch.commit()
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/cloud/logging_v2/logger.py", line 467, in commit
pilot_1:     client.logging_api.write_entries(
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/cloud/logging_v2/_gapic.py", line 163, in write_entries
pilot_1:     self._gapic_api.write_log_entries(request=request)
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/cloud/logging_v2/services/logging_service_v2/client.py", line 955, in write_log_entries
pilot_1:     response = rpc(
pilot_1:                ^^^^
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
pilot_1:     return wrapped_func(*args, **kwargs)
pilot_1:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 293, in retry_wrapped_func
pilot_1:     return retry_target(
pilot_1:            ^^^^^^^^^^^^^
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 153, in retry_target
pilot_1:     _retry_error_helper(
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/api_core/retry/retry_base.py", line 212, in _retry_error_helper
pilot_1:     raise final_exc from source_exc
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
pilot_1:     result = target()
pilot_1:              ^^^^^^^^
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout
pilot_1:     return func(*args, **kwargs)
pilot_1:            ^^^^^^^^^^^^^^^^^^^^^
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
pilot_1:     raise exceptions.from_grpc_error(exc) from exc
pilot_1: google.api_core.exceptions.ResourceExhausted: 429 Quota exceeded for quota metric 'Write requests' and limit 'Write requests per minute' of service 'logging.googleapis.com' for consumer 'project_number:707963555525'. [reason: "RATE_LIMIT_EXCEEDED"
pilot_1: domain: "googleapis.com"
pilot_1: metadata {
pilot_1:   key: "consumer"
pilot_1:   value: "projects/707963555525"
pilot_1: }
pilot_1: metadata {
pilot_1:   key: "quota_limit"
pilot_1:   value: "WriteRequestsPerMinutePerProject"
pilot_1: }
pilot_1: metadata {
pilot_1:   key: "quota_limit_value"
pilot_1:   value: "120000"
pilot_1: }
pilot_1: metadata {
pilot_1:   key: "quota_location"
pilot_1:   value: "global"
pilot_1: }
pilot_1: metadata {
pilot_1:   key: "quota_metric"
pilot_1:   value: "logging.googleapis.com/write_requests"
pilot_1: }
pilot_1: metadata {
pilot_1:   key: "service"
pilot_1:   value: "logging.googleapis.com"
pilot_1: }
pilot_1: , links {
pilot_1:   description: "Request a higher quota limit."
pilot_1:   url: "https://cloud.google.com/docs/quota#requesting_higher_quota"
pilot_1: }
pilot_1: ]
pilot_1: 2024-05-07 21:25:32,103 | ERROR    | google.cloud.logging_v2.handlers.transports.background_thread | _safely_commit_batch      | Failed to submit 10 logs.
pilot_1: Traceback (most recent call last):
pilot_1:   File "/cvmfs/sw.lsst.eu/linux-x86_64/panda_env/v1.0.15/conda/install/envs/pilot/lib/python3.11/site-packages/google/cloud/logging_v2/handlers/transports/background_thread.py", line 115, in _safely_commit_batch
pilot_1:     batch.commit()

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

wguanicedew commented 3 months ago

In addition, there is no simply way to control the batch_size and the latency (https://github.com/googleapis/python-logging/blob/main/google/cloud/logging_v2/handlers/transports/background_thread.py#L133-L134). Could you add environment parameter supports to update these two configuration value?

gkevinzheng commented 3 months ago

@wguanicedew Do you know what Exception type is causing the retry function to retry immediately and not backoff?

wguanicedew commented 2 months ago

In the stackstrace I posted in the ticket, it already said "pilot_1: google.api_core.exceptions.ResourceExhausted: 429 Quota exceeded for quota metric 'Write requests' and limit 'Write requests per minute' of service 'logging.googleapis.com' for consumer 'project_number:707963555525'. [reason: "RATE_LIMIT_EXCEEDED" "