Open KepingYan opened 3 months ago
Also facing this issue. Any progress?
cc @KepingYan , @zacharie-martin . Thanks for flagging this out! I think this error message is just a logging from torch and should be non-blocking. While you may saw this error message, torch profiler still works at the same time. I have tried the above repro scripts, I can successfully got the .json
trace file and view it using Tensorboard.
What happened + What you expected to happen
If I add pytorch profiler tool in train_function of TorchTrainer, it will report an error:
But if the train_func is called alone without TorchTrainer, the profiler will work normal. I followed this tutorial (https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/optimize-performance.html#performance-debugging-gpu-profiling), is there any other configuration that needs to be modified?
Versions / Dependencies
Ray 2.32.0 Python 3.9.18 Torch 2.4.0 OS Ubuntu 22.04.4
Reproduction script
Issue Severity
High: It blocks me from completing my task.