python / cpython

The Python programming language
https://www.python.org
Other
62.29k stars 29.93k forks source link

Handle TCP socket failure in `logging.handlers.SysLogHandler.emit` #122959

Open gumballcmm opened 1 month ago

gumballcmm commented 1 month ago

Bug report

Bug description:

Client loggers on TCP using a socket stream can timeout. A TCP logger server could issue disconnect to TCP client due to timeout caused usually by no activity after some elapsed time. TCP servers typically have to control how many TCP listeners are active in memory to prevent TCP flooding. They do it by monitoring TCP activity timeouts and by removing open TCP sockets from memory and by sending a FIN to client to disconnect socket in remote TCP table as well.

Going back to the Python apps that use "import logging" to send events to specifically to TCP handlers, an app relying on TCP logger would not recover successfully if socket was closed by server's TCP FIN packet; thus app's syslog message/event would no longer be forwarded to TCP log collector. The bug exists because there is no verification of TCP socket stream existence in logging.handlers.SysLogHandler.emit.

If socket was torned down by Kernel it doesn't exist and socket Exception is generated and a new socket stream of type socket.SOCK_STREAM should be created and message resent to remote TCP collector on 2nd attempt. The reason why the bug exists is because the current logger handler does not support explicit try:except auto-remediation handling for socket types socket.SOCK_STREAM (TCP sockets) in logging.handlers.SysLogHandler.emit. The proposed update below addresses that gap by providing recovery support for TCP sockets that were closed by Kernel ensuring data logging continuity for Python apps without restarting the application..

Notice that there isn't a recovery block in the original code for socket type socket.SOCK_STREAM in logging.handlers.

I added the code in bold to show how the socket would be created for a Python app using TCP-based client logging as a handler. I have confirmed this bug exist with our TCP syslog infrastructure and confirmed corrected by the following code in bold. You could duplicate yourself by adding a TCP handler to any IP using import logging and then wait have the TCP server disconnect the socket sending FIN to client and the TCP client will fail to create a new socket on new event through logger. An error should appear when trying to self.socket.send(msg) when a TCP socket doesn't exist and there is no try/except to recover from that in original code.

Please approve this bug and I'll issue a pull request with the bolded code below to fix it.

From logging.handlers.SysLogHandler.emit:

if not self.socket:
    self.createSocket()

if self.unixsocket:
    try:
        self.socket.send(msg)
    except OSError:
        self.socket.close()
        self._connect_unixsocket(self.address)
        self.socket.send(msg)
+# UDP socket type
elif self.socktype == socket.SOCK_DGRAM:
    self.socket.sendto(msg, self.address)
+# TCP socket type
+# This block doesn't exist in original handlers.py code for TCP
+elif self.socktype == socket.SOCK_STREAM:
+    try:
+        self.socket.sendall(msg)
+    # create new TCP socket stream, if TCP socket broken or disconnected
+    except OSError:
+        self.socket = socket.socket(socket.AF_INET, self.socktype)
+        self.socket.connect(self.address)
+        # retry sending msg after making new TCP AF_INET socket
+        self.socket.sendall(msg)
else:
    self.socket.sendall(msg)

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

picnixz commented 1 month ago

I've edited your issue so that it could be a bit more readable.

gumballcmm commented 1 month ago

@picnixz - Thanks for that correction. Makes sense.

pygeek commented 3 weeks ago

I recommend catching OSError in stead of the general except Exception. OSError is the subclass for all socket related errors.

picnixz commented 3 weeks ago

Ah yes, it's maybe better. I just focused on reformatting the issue. By the way, you are free to open a PR if you want.

Zheaoli commented 3 weeks ago

Actually, I think we should not handle the connection error by default. The retry policy should be handled by user? I guess

gumballcmm commented 3 weeks ago

@pygeek - My reasoning for using catching the error with Exception was to handle all errors from self.socket.sendall(msg) to prevent from having to write multiple "specific" catches in the future in case there were others besides OSError. What's your thoughts on OSError specifically over Exception for the function? Have you tested it with OSError?

pygeek commented 3 weeks ago

All socket related errors are derived from the OSError class as per PEP 3151. If this were to change it would require a new PEP to be introduced detailing the justification, as well as a larger refactor. If an error outside of OSError and its subclasses were to be raised here, it would be unexpected, and therefore should be raised. Generally, broad exceptions tend to obfuscate bugs and make it more difficult to debug.

https://peps.python.org/pep-3151/

All errors raise exceptions. The normal exceptions for invalid argument types and out-of-memory conditions can be raised. Errors related to socket or address semantics raise OSError or one of its subclasses.

https://docs.python.org/3/library/socket.html

https://docs.python.org/3/library/exceptions.html#exception-hierarchy

https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/broad-exception-caught.html

gumballcmm commented 3 weeks ago

Okay thanks @pygeek for your help. I changed error handling to OSError type from Exception type. I opened a PR request for it https://github.com/python/cpython/pull/123238. I wasn't sure if we should backport the update request to previous releases. This is my first time contributing btw to an open source project. It's been a learning process.