pytorch / ignite

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
https://pytorch-ignite.ai
BSD 3-Clause "New" or "Revised" License
4.5k stars 608 forks source link

Add optional arg `encoding = None` for `ignite.utils.setup_logger` #3239

Closed sjiang95 closed 2 months ago

sjiang95 commented 2 months ago

🚀 Feature

Add optional arg encoding = None for ignite.utils.setup_logger

Motivation

I encountered garbled text when loggering messages that contains CJK characters using logger.info(). For example,

from ignite.utils import setup_logger
logger = setup_logger(name="theName", filepath='filename.log')

logger.info('你好') # garbled text in the written .log file

In the .log file, garbled text is printed.

�й���׼ʱ��.

Solution

This can be addressed by simply passing encoding = "utf-8" to the filehandler

https://github.com/pytorch/ignite/blob/f431e60b09743dc8d99b7e5f32e234f46a2a920d/ignite/utils.py#L268

def setup_logger(
    name: Optional[str] = "ignite",
    level: int = logging.INFO,
    stream: Optional[TextIO] = None,
    format: str = "%(asctime)s %(name)s %(levelname)s: %(message)s",
    filepath: Optional[str] = None,
    encoding: Optional[str] = None, # add this optional arg
    distributed_rank: Optional[int] = None,
    reset: bool = False,
) -> logging.Logger:

        if filepath is not None:
            fh = logging.FileHandler(filepath, encoding = encoding) # pass encoding to the file handler
            fh.setLevel(level)
            fh.setFormatter(formatter)
            logger.addHandler(fh)

Looking forward to a discussion about this before I create a PR.

vfdev-5 commented 2 months ago

Thanks for the suggestion of improvement, @sjiang95 ! If you would like to add this feature and a test in a draft PR, it would be awesome!

Few notes in case you would like to send a PR:

sjiang95 commented 2 months ago

Hello @vfdev-5 ,

thanks for your response.

I also want to set encoding="utf-8" by default.

vfdev-5 commented 2 months ago

@sjiang95 by the way, what is your OS, windows ?

By default, logging FileHandler is using default encoding as in open: https://docs.python.org/3/library/logging.handlers.html#filehandler which is

In text mode, if encoding is not specified the encoding used is platform-dependent: locale.getencoding() is called to get the current locale encoding.

Source: https://docs.python.org/3/library/functions.html#open

and

On Android and VxWorks, return "utf-8". On Unix, return the encoding of the current LC_CTYPE locale. Return "utf-8" if nl_langinfo(CODESET) returns an empty string: for example, if the current LC_CTYPE locale is not supported. On Windows, return the ANSI code page.

Testing on the linux your PR, even with encoding=None we have already utf-8 encoding:

    fp = dirname / "log"
    logger = setup_logger(name="logger", filepath=fp, encoding=None)
    logger.info("你好")  # ni hao

    with open(fp, "r") as h:
        data = h.readlines()

    assert "你好" in data[0]

so, your PR will fix this for other OS and setups...

sjiang95 commented 2 months ago

@vfdev-5 yes I encountered this problem on windows.

How do you think we should deal with this? How about importing platform and using platform.system() to help limiting the behaviour on only windows?

vfdev-5 commented 2 months ago

We can use encoding="utf-8" by default and in the tests check the platform as you are suggesting.