ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.03k stars 5.78k forks source link

[Train] TUNE_DISABLE_AUTO_CALLBACK_LOGGERS together with TorchTrainer leads to FileNotFoundError error #48683

Open juulie opened 6 days ago

juulie commented 6 days ago

What happened + What you expected to happen

I set TUNE_DISABLE_AUTO_CALLBACK_LOGGERS to 1 since i dont want to use the Tenserboard logger. But this also disables the JSON and CSV logger.

These last 2 are required by ExperimentAnalysis, which is what is returned by the Tuner that TorchTrainer creates internally

Versions / Dependencies

latest/nightly

Reproduction script

  1. Init ray with env_var TUNE_DISABLE_AUTO_CALLBACK_LOGGERS set to 1
  2. Create TorchTrainer and call fit.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

juulie commented 6 days ago

I personally dont like Tensorboard being part of the auto callback loggers, having just the CSV and JSON logger is fine. There is also no way to add Tuner callback through the TorchTrainer interface, so i can add these two in manually while setting TUNE_DISABLE_AUTO_CALLBACK_LOGGERS to 1

juulie commented 5 days ago

I managed to get around this by adding the JsonLoggerCallback and CSVLoggerCallback to the RunConfig callbacks, but this dependency isnt described anywhere so there should be some documention around that when using TUNE_DISABLE_AUTO_CALLBACK_LOGGERS