googleapis / python-logging

Apache License 2.0
130 stars 55 forks source link

rpc errors only on Python 3.12 #855

Open efroemling opened 9 months ago

efroemling commented 9 months ago

Recently I started migrating an App Engine app from Python 3.11 to 3.12 and am seeing occasional cloud logging errors in 3.12 related to cloud logging committing batches in the background thread. The errors don't occur in 3.11 and I don't recall seeing them in prior versions either.

Environment details

Steps to reproduce

In my case, switching my app-engine service's runtime from python311 to python312 results in these occasional errors. If I go back to Python 3.11 the errors cease.

Code example

The following is the entirety of my logging setup. I'm mostly relying on standard Python log calls but have one or two places where I talk to cloud logging directly in case that could be a culprit.

        # Wire up to handle standard Python logs.
        gc_log_client = google.cloud.logging.Client()
        gc_log_client.setup_logging(log_level=logging.INFO, name='bamaster')

        # Also set up a direct logger (used in a few places only).
        gc_logger_direct = gc_log_client.logger(name='bamaster_direct')

Stack trace

The following are the errors I'm seeing. The stack traces seem to be oddly cut off.

Screenshot 2024-02-15 at 10 31 58 AM

Any ideas what could have changed here between Python 3.11 and 3.12? Please let me know if there is anything I can do to provide more info on this. Thanks.

gkevinzheng commented 9 months ago

@efroemling I can't see the rest of the stack trace, but by any chance does it say RuntimeError: can't create new thread at interpreter shutdown?

efroemling commented 9 months ago

@gkevinzheng Unfortunately I don't know what the rest of the stack trace is; whatever mechanism in app-engine that is forwarding stderr to cloud-logging seems to always cut off at the 'rpc(' part. Would that in itself be a separate bug? Or do you know how I might dig further into this and get at the full stack trace?

efroemling commented 9 months ago

@gkevinzheng Actually, looking through the surrounding logs a bit more I am seeing mention of that around the time of the error:

Screenshot 2024-02-15 at 1 06 22 PM
gkevinzheng commented 9 months ago

@efroemling I've seen similar stack traces/errors here, #850, and the fact that it's 3.12 exclusive likely points towards a similar issue. The atexit handler we use to shut down/flush the background thread is crashing in 3.12 because of a change in version 3.12: https://docs.python.org/3/library/atexit.html#atexit.register. Working on a fix for it now.

efroemling commented 9 months ago

Awesome to hear a fix is in progress; thanks! Is this something that is likely to result in lost log messages? Wondering if I should revert to 3.11 until the fix comes through or just ignore the sporadic errors for now...

gkevinzheng commented 9 months ago

@efroemling Not sure. The error occurs in 3.12 as a way to resolve undefined behavior, so no matter what it's a bug on our end.