Open omatthew98 opened 1 week ago
To be clear about the expected behavior some examples / modifications of the repro:
Script:
print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()
import ray.data
ray.init(logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO"))
print("AFTER:")
report_logger(logging.getLogger("ray.data"))
Output:
BEFORE
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: WARNING)
No handlers configured
Logger: ray (Level: INFO)
Handlers:
- PlainRayHandler (Level: NOTSET)
Logger: ray.data (Level: NOTSET)
No handlers configured
AFTER:
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: INFO)
Handlers:
- StreamHandler (Level: INFO)
Logger: ray (Level: INFO)
Handlers:
- StreamHandler (Level: INFO)
Logger: ray.data (Level: NOTSET)
No handlers configured
Notes: Ray Data configuration ignored, only Ray Core initialization respected.
Script:
print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()
ray.init(logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO"))
import ray.data
print("AFTER:")
report_logger(logging.getLogger("ray.data"))
Output:
BEFORE
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: WARNING)
No handlers configured
Logger: ray (Level: INFO)
Handlers:
- PlainRayHandler (Level: NOTSET)
Logger: ray.data (Level: NOTSET)
No handlers configured
AFTER:
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: INFO)
Handlers:
- StreamHandler (Level: INFO)
Logger: ray (Level: INFO)
Handlers:
- StreamHandler (Level: INFO)
Logger: ray.data (Level: DEBUG)
Handlers:
- SessionFileHandler (Level: NOTSET)
- PlainRayHandler (Level: INFO)
- SessionFileHandler (Level: ERROR)
Notes: Ray Core configuration is done first, Ray Data configuration is done second, both are respected.
Script:
print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()
import ray.data
print("AFTER:")
report_logger(logging.getLogger("ray.data"))
Output:
BEFORE
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: WARNING)
No handlers configured
Logger: ray (Level: INFO)
Handlers:
- PlainRayHandler (Level: NOTSET)
Logger: ray.data (Level: NOTSET)
No handlers configured
AFTER:
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: WARNING)
No handlers configured
Logger: ray (Level: INFO)
Handlers:
- PlainRayHandler (Level: NOTSET)
Logger: ray.data (Level: DEBUG)
Handlers:
- SessionFileHandler (Level: NOTSET)
- PlainRayHandler (Level: INFO)
- SessionFileHandler (Level: ERROR)
Notes: Correctly configures the Ray Data logger.
Script:
print("BEFORE")
report_logger(logging.getLogger("ray.data"))
print()
ray.init(logging_config=ray.LoggingConfig(encoding="JSON", log_level="INFO"))
print("AFTER:")
report_logger(logging.getLogger("ray.data"))
Output:
BEFORE
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: WARNING)
No handlers configured
Logger: ray (Level: INFO)
Handlers:
- PlainRayHandler (Level: NOTSET)
Logger: ray.data (Level: NOTSET)
No handlers configured
AFTER:
Logging configuration for 'ray.data' and its hierarchy:
Logger: root (Level: INFO)
Handlers:
- StreamHandler (Level: INFO)
Logger: ray (Level: INFO)
Handlers:
- StreamHandler (Level: INFO)
Logger: ray.data (Level: NOTSET)
No handlers configured
Notes: Correctly configures the Ray Core logger.
@omatthew98 why don't we just do
Update from some more offline discussion with @alexeykudinkin: For now, ensuring that the import / happens after the initialization (e.g. ensuring that the ray module logger is configured before the ray.data module logger), should be an immediate workaround. We should still try to improve the behavior as this is a footgun that will result in unexpected results and missing logs that will hinder debugging efforts for users. Ideally we would do what was suggested above, but unfortunately python's logging module does not provide a native way to retrieve the existing logging configuration as a dictionary, so we would have to manually inspect and reconstruct the dict before merging the provided config which would be somewhat hacky.
An alternative to this would be to just wrap this configuration at the ray core level by:
configureLogging
and have it be called from both core and other librariesconfigureLogging
function, merge the provided dictionary with the existing dictionary configuration and then update the logging config with the merged config. I am trying to understand the structured logging behavior as well. One of my question is
ray.data
logger configuration; but whyIt seems by going in order #2, we got the correct logger config for all three loggers: i.e., root
by ray.init, ray
by ray.init, and finally ray.data
by import ray data
I am trying to understand the structured logging behavior as well. One of my question is #1. Ray Data Import then Ray Core Init [UNEXPECTED BEHAVIOR] will wipe out the
ray.data
logger configuration; but why #2. Ray Core Init then Ray Data Import [EXPECTED BEHAVIOR] maintains the expected logger configurations of both logger? It seems by going in order #2, we got the correct logger config for all three loggers: i.e.,root
by ray.init,ray
by ray.init, and finallyray.data
by import ray data
Yeah I think that understanding is correct. From what I have read, configuring the parent logger after the child logger (so the order of #1), the result will be only the parent logger is configured. If you configure the child logger after the parent logger (so the order of #2), then the two loggers will be configured as expected. I think this is is just an implementation detail for python's logging module that doesn't seem particularly well documented.
What happened + What you expected to happen
We use
logging.config.dictConfig(config)
to configure the ray data logger (here), but this is also how ray core configures the ray logger (here).For both of these logging configs, we use
disable_existing_loggers: False
. The behavior for this is described as (logging docs):This description makes it seem like these two logging calls are commutative (regardless of ordering they will produce the same result), but that is not exactly how the python logging module works. If we configure the ray module logger then the ray.data module logger, the results are expected and both are configured. If we instead configure the ray.data module then configure the ray module logger, then the ray.data logging configuration is clobbered. This happens because when configuring the parent logger of a module (e.g. ray module logger is the parent logger of the ray.data module logger), the various handlers associated with the child logger are not guaranteed to be preserved.
Our end goal should be a state where the call order of the logging configurations should not affect the logging behavior.
Versions / Dependencies
ray==2.39.0
Reproduction script
Issue Severity
High: It blocks me from completing my task.