micropython / micropython-lib

Core Python libraries ported to MicroPython
Other
2.42k stars 1k forks source link

`logging.FileHandler` not adding new lines to log file #628

Open AT0myks opened 1 year ago

AT0myks commented 1 year ago

Just tried the updated logging module to replace the StreamHandler and FileHandler adapted from CPython that I was previously using. Thank you @iabdalkader for this update.

StreamHandler works as expected, however I noticed that I wasn't getting any new lines in my log file. I looked at the previous code and saw that there's a flush after the write in StreamHandler.emit. Using this subclass for FileHandler, it works like before:

class MyStreamHandler(StreamHandler):

    def flush(self):
        if self.stream and hasattr(self.stream, "flush"):
            self.stream.flush()

    def emit(self, record):
        super().emit(record)
        self.flush()

Is there a reason for not doing a flush in emit, maybe something specific to MicroPython that I'm not aware of?

AT0myks commented 1 year ago

To be fair flush is called in StreamHandler.close() so calling FileHandler.close() works. All handlers are also closed in shutdown but there's no sys.atexit right now for my configuration so the only two options that don't require a subclass are to either explicitly close the file handler or call shutdown manually. But this still doesn't make the log messages appear in "real time" because unless you call FileHandler.stream.flush() after each message, you have to wait for the end of the program (if it's even supposed to end) for the flush to happen.

andrewleech commented 1 year ago

On many/most micropython ports the filesystem write operations go via a ram cache to ensure efficient use of flash blocks and to speed up overall operations.

In many cases flash write is very slow, so you don't want to do it more often than necessary.

The flush should happen occasionally automatically, whenever a flash block worth of logging has been written, out wherever any other file on the device gets written to.

I think this should be there most efficient way for logging to work, however yes it runs the risk of losing the most recent logging data in the event of reset or power failure.

How are you reading the log file while the application is running? Are you reading it for a different function in your application?

AT0myks commented 1 year ago

This is how I read the file:

from microdot_asyncio import Microdot

app = Microdot()

@app.get("/api/logs")
async def logs(request):
    def generate_logs():
        with open(log_file, 'r') as f:
            for line in f:
                yield line

    return generate_logs()

It's just for checking the logs quickly, I'm not using the log file in another function. There's only a few messages per day so I guess that's why it's not flushing often.

andrewleech commented 1 year ago

Ah yep that makes sense!

Your subclass is arguably the correct way to do this to flush immediately, when the logging is minimal and flush to disk doesn't slow things too much

You could also just expose the flush function from the logging file handler and call it in the microdot route handler just before reading too, though that would slow down the request instead.

It would be good to document things like this somewhere, the wiki is probably the right place!

AT0myks commented 1 year ago

Thank you for your help. I think for now I'm gonna stick with the subclass, even if it means also having to copy paste FileHandler just to change its parent class.

Maybe we could add an autoflush parameter to StreamHandler? If properly documented I think it would be pretty useful instead of having to make a subclass or flush manually. Although I don't really see where in the wiki to add this.

andrewleech commented 1 year ago

You could override just Filrhander and include an emit function there which calls flush then super().emit()

Adding options like that is unfortunately a difficult thing to balance; while it feels useful for micropython, it breaks compatibility with cpython stdlib which is an overall goal for everything in the stdlib library here.

AT0myks commented 1 year ago

You're right, I don't have to subclass StreamHandler, don't know why I didn't think about it.

Just so we're clear, CPython's StreamHandler flushes right after a write (and that's why I had it in my previous code). But for MicroPython we choose not to do that because of the reasons you detailed above, right? I just want to make sure there's an actual reason for this, and it's not just a few lines of code that are missing.

andrewleech commented 1 year ago

To be honest I don't know for certain if this was a conscious design choice here, but I've had experience with very slow logging from flushing every line so it makes sense to me.

Do you know if "flush in every line" is documented behaviour for cpython, or just happens to be the way it appears to work?

iabdalkader commented 1 year ago

I looked at the previous code and saw that there's a flush after the write in StreamHandler.emit.

Which old code are you referring to ? The old logger did Not have any defined Handlers, and it didn't call flush, correct ? I double checked it and I don't think I missed anything in the update, the main point was to add a minimal sensible logger, while maintaining old behavior, and as far as I know that was the old behavior.

I agree with @andrewleech calling flush after every single write on MicroPython is very bad, would also like to add that on any modern OS/PC a flush isn't necessarily a real disk write, there are so many levels of buffers and caches along the way to the disk (even inside the disk itself), so CPython can get away with calling flush whenever, whereas in MicroPython a flush will erase a Flash sector and sync the cache.

That said, I only really care about API compatibility, if you want to submit a PR to fix it somehow without breaking the API go ahead, I think adding an arg might be okay, but it really depends on the maintainers though.

Alternatively, it was mentioned more than once that there will be an extension package called logging_handlers with extra handlers (like rotating file log, etc..) maybe you can add it, and add a special NonBufferedFileHandler.

AT0myks commented 1 year ago

@andrewleech I don't see anything explicitly mentioning a flush for every message in the CPython documentation. It was just an observation from reading the source code.

@iabdalkader I'm referring to the code "adapted from CPython that I was previously using". Sorry if I wasn't clear enough. There was indeed no defined handlers before.

Thanks for the explanation, I didn't know that about MicroPython.

Right now the behaviour of FileHandler feels more like CPython's MemoryHandler with a FileHandler target. Except that we don't have control on when the flush happens. I know the readme says that many modules in python-stdlib have reduced functionality, but if we want closer compatibility with CPython, why not add the flushing and instead have a BufferedFileHandler?

If it's safer to leave StreamHandler like it is right now I'm completely fine with it. In the end it comes back to the issue that (as far as I know) this isn't documented anywhere. In my case when I updated the module I was indeed expecting to get the same behaviour as CPython. I'm not blaming anyone for the potential lack of documentation, and if it exists then I can't seem to find it.

Andrew pointed me to the wiki but I would expect this to be in the official docs. Unfortunately I don't think this is possible as these are two separate repos, and also because the modules here are updated separately from the "core". Are there any plans for a documentation specific to this repo?

andrewleech commented 1 year ago

Heh, it's hard enough to match cpython stdlib documented API of packages let alone starting to fill in undocumented side effects of implementation ;-)

As said previously, adding a flush to every line would have negative performance side effects for many users, so it would be bad to make this standard behaviour here.

To be fair I only know this because I was helping on a project a couple of years ago with a serious bug where it was locking up at weird times... it turned out the bug was caused by their custom logging handler flushing every line!

I don't know of anyone offering the time / effort required to document all the packages here, current intentions as I understand it is to match cpython stdlib API enough for their documentation to be "close enough" - but yes that means currently users need to read through the code to see which functions are written and which are missing. "The code is the documentation" so to speak.

With regard to a BufferedFileHandler, well I've got something like that in a work project, in that it has a separate 2KB ringbuffer that's logged into and only flushed to flash when my application is idle - that's a much different thing to the existing FileHandler which does not have any buffering... other than the implicit buffering of the filesystem on the pico board you're using - some other ports/filesystems don't have this buffering!

Yes it'd be fantastic to have this sort of thing all covered in detailed documentation... pr's are always welcome ;-)