ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.16k stars 5.8k forks source link

[Core] Ray Core / Ray Data logging configuration leads to unexpected behavior #48732

Open omatthew98 opened 1 week ago

omatthew98 commented 1 week ago

What happened + What you expected to happen

We use logging.config.dictConfig(config) to configure the ray data logger (here), but this is also how ray core configures the ray logger (here).

For both of these logging configs, we use disable_existing_loggers: False. The behavior for this is described as (logging docs):

disable_existing_loggers - whether any existing non-root loggers are to be disabled. This setting mirrors the parameter of the same name in fileConfig(). If absent, this parameter defaults to True. This value is ignored if incremental is True.

This description makes it seem like these two logging calls are commutative (regardless of ordering they will produce the same result), but that is not exactly how the python logging module works. If we configure the ray module logger then the ray.data module logger, the results are expected and both are configured. If we instead configure the ray.data module then configure the ray module logger, then the ray.data logging configuration is clobbered. This happens because when configuring the parent logger of a module (e.g. ray module logger is the parent logger of the ray.data module logger), the various handlers associated with the child logger are not guaranteed to be preserved.

Our end goal should be a state where the call order of the logging configurations should not affect the logging behavior.

Versions / Dependencies

ray==2.39.0

Reproduction script

import ray
import logging

def report_logger(logger):
    # Collect this logger and its parents
    loggers = []
    current_logger = logger
    while current_logger:
        loggers.append(current_logger)
        if not current_logger.parent or current_logger.parent == current_logger:
            break
        current_logger = current_logger.parent

    # Report the configuration of each logger in the hierarchy
    print(f"Logging configuration for '{logger.name}' and its hierarchy:")
    for log in reversed(loggers):  # Start from the root and go down to the given logger
        print(f"\nLogger: {log.name or 'root'} (Level: {logging.getLevelName(log.level)})")
        if log.handlers:
            print("  Handlers:")
            for handler in log.handlers:
                print(f"    - {handler.__class__.__name__} (Level: {logging.getLevelName(handler.level)})")
        else:
            print("  No handlers configured")

print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()

import ray.data
ray.init(logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO"))

print("AFTER:")
report_logger(logging.getLogger("ray.data"))

Issue Severity

High: It blocks me from completing my task.

omatthew98 commented 1 week ago

To be clear about the expected behavior some examples / modifications of the repro:

1. Ray Data Import then Ray Core Init [UNEXPECTED BEHAVIOR]

Script:

print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()

import ray.data
ray.init(logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO"))

print("AFTER:")
report_logger(logging.getLogger("ray.data"))

Output:

BEFORE
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: WARNING)
  No handlers configured

Logger: ray (Level: INFO)
  Handlers:
    - PlainRayHandler (Level: NOTSET)

Logger: ray.data (Level: NOTSET)
  No handlers configured

AFTER:
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: INFO)
  Handlers:
    - StreamHandler (Level: INFO)

Logger: ray (Level: INFO)
  Handlers:
    - StreamHandler (Level: INFO)

Logger: ray.data (Level: NOTSET)
  No handlers configured

Notes: Ray Data configuration ignored, only Ray Core initialization respected.

2. Ray Core Init then Ray Data Import [EXPECTED BEHAVIOR]

Script:

print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()

ray.init(logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO"))
import ray.data

print("AFTER:")
report_logger(logging.getLogger("ray.data"))

Output:

BEFORE
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: WARNING)
  No handlers configured

Logger: ray (Level: INFO)
  Handlers:
    - PlainRayHandler (Level: NOTSET)

Logger: ray.data (Level: NOTSET)
  No handlers configured

AFTER:
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: INFO)
  Handlers:
    - StreamHandler (Level: INFO)

Logger: ray (Level: INFO)
  Handlers:
    - StreamHandler (Level: INFO)

Logger: ray.data (Level: DEBUG)
  Handlers:
    - SessionFileHandler (Level: NOTSET)
    - PlainRayHandler (Level: INFO)
    - SessionFileHandler (Level: ERROR)

Notes: Ray Core configuration is done first, Ray Data configuration is done second, both are respected.

3. Only Ray Data Import [EXPECTED BEHAVIOR]

Script:

print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()

import ray.data

print("AFTER:")
report_logger(logging.getLogger("ray.data"))

Output:

BEFORE
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: WARNING)
  No handlers configured

Logger: ray (Level: INFO)
  Handlers:
    - PlainRayHandler (Level: NOTSET)

Logger: ray.data (Level: NOTSET)
  No handlers configured

AFTER:
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: WARNING)
  No handlers configured

Logger: ray (Level: INFO)
  Handlers:
    - PlainRayHandler (Level: NOTSET)

Logger: ray.data (Level: DEBUG)
  Handlers:
    - SessionFileHandler (Level: NOTSET)
    - PlainRayHandler (Level: INFO)
    - SessionFileHandler (Level: ERROR)

Notes: Correctly configures the Ray Data logger.

4. Only Ray Core Init [EXPECTED BEHAVIOR]

Script:

print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()

ray.init(logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO"))

print("AFTER:")
report_logger(logging.getLogger("ray.data"))

Output:

BEFORE
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: WARNING)
  No handlers configured

Logger: ray (Level: INFO)
  Handlers:
    - PlainRayHandler (Level: NOTSET)

Logger: ray.data (Level: NOTSET)
  No handlers configured

AFTER:
Logging configuration for 'ray.data' and its hierarchy:

Logger: root (Level: INFO)
  Handlers:
    - StreamHandler (Level: INFO)

Logger: ray (Level: INFO)
  Handlers:
    - StreamHandler (Level: INFO)

Logger: ray.data (Level: NOTSET)
  No handlers configured

Notes: Correctly configures the Ray Core logger.

alexeykudinkin commented 1 week ago

@omatthew98 why don't we just do

  1. Take the current config
  2. Merge in provided one
  3. Set the config back
omatthew98 commented 1 week ago

Update from some more offline discussion with @alexeykudinkin: For now, ensuring that the import / happens after the initialization (e.g. ensuring that the ray module logger is configured before the ray.data module logger), should be an immediate workaround. We should still try to improve the behavior as this is a footgun that will result in unexpected results and missing logs that will hinder debugging efforts for users. Ideally we would do what was suggested above, but unfortunately python's logging module does not provide a native way to retrieve the existing logging configuration as a dictionary, so we would have to manually inspect and reconstruct the dict before merging the provided config which would be somewhat hacky.

An alternative to this would be to just wrap this configuration at the ray core level by:

  1. Persist configuration as root level field of the core logging module
  2. Create util configureLogging and have it be called from both core and other libraries
  3. Within the configureLogging function, merge the provided dictionary with the existing dictionary configuration and then update the logging config with the merged config.
hongpeng-guo commented 1 week ago

I am trying to understand the structured logging behavior as well. One of my question is

1. Ray Data Import then Ray Core Init [UNEXPECTED BEHAVIOR] will wipe out the ray.data logger configuration; but why

2. Ray Core Init then Ray Data Import [EXPECTED BEHAVIOR] maintains the expected logger configurations of both logger?

It seems by going in order #2, we got the correct logger config for all three loggers: i.e., root by ray.init, ray by ray.init, and finally ray.data by import ray data

omatthew98 commented 6 days ago

I am trying to understand the structured logging behavior as well. One of my question is #1. Ray Data Import then Ray Core Init [UNEXPECTED BEHAVIOR] will wipe out the ray.data logger configuration; but why #2. Ray Core Init then Ray Data Import [EXPECTED BEHAVIOR] maintains the expected logger configurations of both logger? It seems by going in order #2, we got the correct logger config for all three loggers: i.e., root by ray.init, ray by ray.init, and finally ray.data by import ray data

Yeah I think that understanding is correct. From what I have read, configuring the parent logger after the child logger (so the order of #1), the result will be only the parent logger is configured. If you configure the child logger after the parent logger (so the order of #2), then the two loggers will be configured as expected. I think this is is just an implementation detail for python's logging module that doesn't seem particularly well documented.