python / cpython

The Python programming language
https://www.python.org
Other
63.41k stars 30.36k forks source link

TimeRotatingFileHandler doesn't work as expected #111054

Closed Foundsheep closed 1 year ago

Foundsheep commented 1 year ago

Bug report

Bug description:

Sumary

image

Code for this logger

import logging
from logging import handlers
import datetime
import pathlib
import socket

class CustomTimedRotatingFileHandler(handlers.TimedRotatingFileHandler):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.suffix = "%Y%m%d"

    def emit(self, record):
        if record.levelno in [logging.INFO, logging.WARNING]:
            record.msg = str(record.msg)[:255]
        # elif record.levelno in [logging.DEBUG, logging.ERROR, logging.CRITICAL]:
        #     record.msg = str(record.msg)[:65535]
        else:
            record.msg = str(record.msg)[:65535]

        super().emit(record)

def create_logger(name, is_from_client=False):

    def get_private_ip():
        socket_name = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        private_ip = None
        try:
            socket_name.connect(("8.8.8.8", 80))
            private_ip = socket_name.getsockname()[0]
        finally:
            socket_name.close()
        return private_ip

    today_str = datetime.datetime.today().strftime("%Y-%m-%d")
    base_dir = pathlib.Path("./logs", "android_server")

    if not base_dir.exists():
        base_dir.mkdir(parents=True, exist_ok=True)

    log = logging.getLogger(name)

    file_name = base_dir / "android_server.log"
    formatter = logging.Formatter(
        f"[%(asctime)s][%(levelname)s]{'' if is_from_client else '[' + str(get_private_ip()) + ']'}{'%(message)s]' if is_from_client else '[%(message)s]'}"
    )

    custom_trh = CustomTimedRotatingFileHandler(filename=file_name, when="midnight")
    custom_trh.setFormatter(formatter)
    log.addHandler(custom_trh)

    console_handler = logging.StreamHandler()
    console_handler.setFormatter(formatter)
    log.addHandler(console_handler)

    log.setLevel(logging.DEBUG)
    return log

def get_client_public_ip(request):
    ip_address = request.environ.get('HTTP_X_FORWARDED_FOR', request.remote_addr)
    return "[" + str(ip_address) + "]" + "["

CPython versions tested on:

3.9

Operating systems tested on:

Linux

ned-deily commented 1 year ago

@vsajip

vsajip commented 1 year ago

Please provide a minimal standalone script (not one which requires running in a web application/framework context) that demonstrates the issue. I am unable to reproduce this with the TimedRotatingFileHandler in the stdlib.

Foundsheep commented 1 year ago

Not exactly the same logger I used in my code. But this will do the job.

Basically, It is trying to leave logs for three days and let's see what happens with the file name. You can still use the script below with some minor changes derived from my original code above, if you want to test it more accurately(e.g., CustomTimedRotatingFileHandler)

import logging
from logging.handlers import TimedRotatingFileHandler
import time

logger = logging.getLogger()
path = "./test.log"

formatter = logging.Formatter("[%(asctime)s][%(levelname)s][%(message)s]")
th = TimedRotatingFileHandler(filename=path, when="midnight")
th.setFormatter(formatter)

logger.addHandler(th)
logger.setLevel(logging.INFO)

three_days = 60 * 60 * 24 * 3

for i in range(1, three_days+1):
    logger.info(f"===[{i:6d}]")
    time.sleep(1)
vsajip commented 1 year ago

I'm not able to reproduce your results with the above script (I changed just the log file name). The first_and_last script just prints the first and last lines of its argument(s). See below:

vsajip@psi:~/tmp/3.8$ first_and_last *
timed_test_2.log::
[2023-10-31 00:00:00,407][INFO][===[235414]]
[2023-10-31 06:36:59,274][INFO][===[259200]]
timed_test_2.log.2023-10-28::
[2023-10-28 07:30:57,140][INFO][===[     1]]
[2023-10-28 23:59:59,613][INFO][===[ 59260]]
timed_test_2.log.2023-10-29::
[2023-10-29 00:00:00,614][INFO][===[ 59261]]
[2023-10-29 23:59:59,255][INFO][===[149134]]
timed_test_2.log.2023-10-30::
[2023-10-30 00:00:00,256][INFO][===[149135]]
[2023-10-30 23:59:59,405][INFO][===[235413]]
vsajip@psi:~/tmp/3.8$ ls
timed_test_2.log  timed_test_2.log.2023-10-28  timed_test_2.log.2023-10-29  timed_test_2.log.2023-10-30
vsajip@psi:~/tmp/3.8$ grep -v 2023-10-30 timed_test_2.log.2023-10-30
vsajip@psi:~/tmp/3.8$ grep -v 2023-10-29 timed_test_2.log.2023-10-29
vsajip@psi:~/tmp/3.8$ grep -v 2023-10-28 timed_test_2.log.2023-10-28
vsajip@psi:~/tmp/3.8$ grep -v 2023-10-31 timed_test_2.log
vsajip@psi:~/tmp/3.8$

This test was with Python 3.8. I'm running it with Python 3.9 now, but I'd be very surprised if it behaved differently.

Foundsheep commented 1 year ago

I agree with your opinion @vsajip. Nevertheless, when I had found out the logging was not working as I had expected, one of the things that came across in my head was how the rotating works in the code

As far as I know, the rollup(?) or rotating happens not in the midnight exactly, but after the midnight and when log is emitted at the same time. And as for this issue, I've looked on the logging.py script, and couldn't find anything that could cause the weird situation I got.

Anyway, the point I'm making here is, wouldn't there be anything with the rollup or emitting logic that could cause the problem? Since the sample script I gave you is constantly emitting messages, so the rollup is quite expectable when set to midnight.

vsajip commented 1 year ago

wouldn't there be anything with the rollup or emitting logic that could cause the problem?

Well, there might be, but a script needs to be created that reproduces the issue, if there is an actual issue - or you need to point out a flaw in the logic (if you see it, you should be able to create a script that exploits the flaw and shows the issue). Your script is emitting messages every second and that fails to show up any problem. I wrote another script that generates messages at intervals of between 30 and 60 seconds, and that doesn't show up any problems either. So I'll close the issue soon as "invalid" if you can't provide a standalone reproducer, but it can always be opened again if you do come up with something later. Note that Python 3.8, 3.9 and 3.10 are in a security-fix-only state, so general bug fixes wouldn't be applied to these versions.

Foundsheep commented 1 year ago

Fair enough.

I couldn't provide the standalone script, since the suspicious behaviour has happened on our solution, which requires a lot of other tools to work as well.

In case anyone would find this issue later, I'm using it with Docker, Flask, Celery, Redis on AWS EC2 instance. There might be the problem causing in one of those tools. I'll figure it out myself, and will update on this github issue, if I encounter the issue once again and find the solution for it.

Foundsheep commented 11 months ago

Hi there,

I'm leaving a comment to let anyone know that there is truly something wrong with the TimedRoatatingFileHandler, as far as I guess.

Point

The main point I'm making here is, it doesn't write on the file it is supposed to write on.

See the picture below. It might seem scary, since there is a lot of log lines there, but is quite easy to understand, if you follow my intention.

  1. I checked 20231128 file to see if there are logs of 2023Nov28 date
    • result : No. Nov 29 logs were in there
  2. I checked android_server.log file to see if it has the latest log and the date of that latest log
    • result : No. it didn't have the latest log. Actually the only log line there was one between those written in 20231128
    • So weird...
    • Seemingly, it wrote on the rolled up file, or it wrote on the both file at the same time
  3. Today is Nov 30. I made a log to see how the file would be rolled up
  4. I checked 20231128 file, because its size had increased and its created time has renewed to be just now.
    • result : everything written there was gone, and Nov 30 logs were there
  5. I checked the newly made file 20231129, guessing there would be what was written on android_server.log before, which is 2023-11-29 16:30:24, 490 ... one
    • result : No. the 20231129 file had Nov 30 log also.
  6. Then, I checked android_server.log to see what are written there
    • result : Nov 30 logs were there.

Conclusion

TimedFileRotatingHandler_went_wrong

vsajip commented 11 months ago

Since this can't readily be reproduced, it may be some issue with your environment that is causing the problem you're seeing. Why are you unable to create a standalone script which demonstrates the problem? As per the comment above, I took your suggested script and was unable to see any aberrant behaviour over several days of running it. If there was something "truly wrong" as you assert, why isn't anyone else reporting this specific problem?

Foundsheep commented 11 months ago

Don't get me wrong. I don't want to offend anyone for this, but I'm just saying what I've experience with it.

  1. standalone script

    • The situation has occurred in a very complex circumstance, like several web frameworks are involved in several AWS EC2 machines etc,. Thus, as I tried to provide an alternative way to reproduce it, there can't be any better way to make a standalone script only for this purpose.
    • Also the standalone script I provided first is working in a very expectable way in a regular basis, thus the case might not fit with the problematic situation I'm facing now, where API calls are making logs with an irregular time basis.
  2. There might be other people who are facing the same issue, but they haven't reported it here, or have been guessing the problem in a different way from I interpreted it.

  3. Regardless of any points, I can't understand how it would write on 3 different files in one log emit which include already rolled-up files(please let me know, if I've interpreted the situation incorrectly)

Anyway, @vsajip you can close this issue as long as you want, but I've left above comments just for those who might face the same issue in the future.

vsajip commented 11 months ago

The issue is already closed. The more complex your setup, the more chances there is a bug somewhere in your code. If you can't isolate the problem in a standalone script using only the stdlib, then perhaps the problem is in your code rather than stdlib code?

What I've learned over many years of software development is to start by assuming that one's own code is faulty somehow, rather than the code it depends on. Prove that the problem is not in your code by paring away all the extraneous stuff until a small script is devised which demonstrates the problem and that it is definitely not in your code, but the code you're depending on, which is called in an acceptable way.

When you find such a standalone script, I'll be happy to look at the issue again.

If you look at closed logging issues / PRs, you'll see that almost all the bug-fix ones are either obvious mistakes or ones where a short script showing the problem was provided. Numerous such short scripts have been adapted into the logging tests.

Foundsheep commented 11 months ago

@vsajip , thanks for the advice, I'm fairly a fresher to this industry.

However, as you mentioned, if what could be isolated to reproduce the problem, the most core thing and the only thing related to the configuration of log in the application is the code I put above at first.

Also, whatever configuratoin is set wrong that produces this result in my custom class, how could that allow it to write on 3 different files for 1 sequence of emits, which include already rolled-up files, as I said?

Is my backupCount configuration set wrong or filename set wrong or mode set wrong? I really doubt it, since they are fairly easy to understand in their expected behaviours and, as I said, writing on 3 different files for 1 sequence of emits can't be understood in any of those configurations.

The problem might lie on other things surrounding my application, such as Docker or Django or Flask(by which I am implicitly saying that the same issue happened in multiple servers using the same custom class).

However, the core point is that someone should be able to understand, at least in theory, how could this write on 3 different files in 1 sequence of emits, which is what really happened. This is the point, rather than trying to put as much effort in making a standalone script as in making a whole application system, which eventually might not reproduce the result.

vsajip commented 11 months ago

@Foundsheep Probably nobody here can tell you at a glance why the symptoms you see are occurring. That's because nobody here apart from you has seen and run the whole code that's causing the problem. For all anyone knows, you have a thoroughly messed-up Python installation, or have a module of your own called logging which somehow conflicts with the stdlib module, or ... almost anything, really.

However, the core point is that someone should be able to understand

That someone is you, as you are the only one with access to all of your code. We are all volunteers, and the generally understood social contract around open source is in part that

  1. People are not going to spend a lot of time thinking about your code especially as they can't see all of it, and even if they could see and run all of it to reproduce the problem for themselves, they are unlikely to spare the time unless they are really interested in your problem, or you pay them for their time.
  2. It's not uncommon for a bug reporter to do most if not all of the work to diagnose a problem, and up to the maintainer, bug reporter and reviewers to come up with the correct/acceptable fix.
  3. If a reported bug can't be reproduced and the reporter can't do a significant portion of the diagnosing of the problem, the report is either closed (if the maintainer doesn't accept that there is a problem), or it languishes as open until someone reproduces it readily and diagnoses the problem. Once diagnosed, it might languish further until someone finds time to work on a fix. This can be helped by the reporter or others figuring out a possible fix and submitting a suggestion or ideally a pull request.

If you are a fresher with limited experience in the industry, perhaps you should think caretully before you post a bug in the first instance, unless you can reproduce the problem with a standalone script as requested already. Every potentially spurious issue posted takes valuable volunteer time to address. (I could probably have done a couple of documentation updates/improvements in the time I've already spent on this issue.) Instead, the usual approach to take would be for you to post to a suitable forum such as the Python Discourse, Stack Overflow or similar to see if others have come across your problem.

Even though I consider myself a fairly experienced developer, when I run into a problem using a new-to-me library or framework, I will generally ask on a forum before posting a bug unless I can easily reproduce and diagnose the problem in a way that I can demonstrate to the maintainer(s). That's because I want to be sure of the ground I'm on, to be sure that I haven't messed something up in my code, and to respect the time of volunteer maintainers.

Foundsheep commented 11 months ago

@vsajip ... I can see you are getting annoyed, and you are losing the way to understand your counterpart.

To make it short, obviously I've been through many other places to look for similar issues and checked for other things I could do differently and looked through the TimedRotatingFileHandler code myself... and come here at last, because I thought leaving an issue here would be the most officially recognisible way by others who would undergo the same issue.

Let's not waste time on this any more for both of us. As I said the whole point of leaving the latter comments were for those who might face the same issue in the future.

I don't want anyone's precious time wasted on this issue. Just hope for you to be freed from the anger and upset you could be feeling now.

vsajip commented 11 months ago

You're mistaken as to my state of mind - I'm not in the least annoyed, nor am I in the grip of "anger and upset". But I am telling it how I see it - spelling it out a bit, perhaps - and that too is for the benefit of others who come across this thread. But then, nuance can be hard to grasp on the Internet. We all just have to do our best. 🙂

philitell commented 7 months ago

I have the exact same problem as mentioned above: https://github.com/python/cpython/issues/111054#issuecomment-1832946380 I'm using TimeRotatingFileHandler on a django-project running on a ngninx-Server (newest version of django and python 3.10)

vsajip commented 7 months ago

Is this in a production environment? What are you using to productionize (e.g. uWSGI, Gunicorn or similar)? There are some recent changes to TimedRotatingFileHandler which have not yet been released. can you try with the repo version of Python to see if the problem persists? If it does, can you provide a minimal standalone project which demonstrates the issue?

philitell commented 7 months ago

Yes it is a production environment where i am using Gunicorn. I will try the repo version.

vsajip commented 7 months ago

Also note that writing to the same file from multiple processes isn't supported. See the cookbook for recipes that can be used in multiple-process scenarios, in case you are logging directly to file from your Django views.