Open ksnof opened 3 months ago
It seems I'm facing the same issue. Trial standard metrics (result.json and progress.csv) are not written for each of the trials. Some trails have them a few trials not. It is a non-deterministic behavior. Mostly they are there but rarely not, for exactly the same training. I use standard callbacks.
FileNotFoundError('Could not fetch metrics for DQN_MA_SAE_model_fcnet_activation=relu,hiddens=[],n_step=6_2024-08-30_00-24-55_848afd15: both result.json and progress.csv were not found at -> trial location
OS: Red Hat 9.4 Ray: 2.34 Python: 3.9
What happened + What you expected to happen
Hi, I have installed the Ray by using pip in a conda environment:
pip install -U "ray[default]"
pip install -U "ray[data,train,tune,serve]"
After installing I switched to Pycharm and tried to run the QuickStart example in the python console, and then I got this FileNotFound error, could you help me out there? Thank you
Following is the console output:
PyDev console: using IPython 8.12.0
Python 3.11.4 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 13:38:37) [MSC v.1916 64 bit (AMD64)] on win32
In [2]: from ray import train, tune ...: ...: ...: def objective(config): # ① ...: score = config["a"] ** 2 + config["b"] ...: return {"score": score} ...: ...: ...: search_space = { # ② ...: "a": tune.grid_search([0.001, 0.01, 0.1, 1.0]), ...: "b": tune.choice([1, 2, 3]), ...: } ...: ...: tuner = tune.Tuner(objective, param_space=search_space) # ③ ...: ...: results = tuner.fit() ...: print(results.get_best_result(metric="score", mode="min").config)
2024-07-28 17:25:56,809 INFO worker.py:1781 -- Started a local Ray instance. 2024-07-28 17:26:00,078 INFO tune.py:253 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call
ray.init(...)
beforeTuner(...)
. ╭──────────────────────────────────────────────────────────────────╮ │ Configuration for experiment objective_2024-07-28_17-25-49 │ ├──────────────────────────────────────────────────────────────────┤ │ Search algorithm BasicVariantGenerator │ │ Scheduler FIFOScheduler │ │ Number of trials 4 │ ╰──────────────────────────────────────────────────────────────────╯View detailed results here: C:/Users/sykdr/ray_results/objective_2024-07-28_17-25-49 To visualize your results with TensorBoard, run:
tensorboard --logdir C:/Users/sykdr/AppData/Local/Temp/ray/session_2024-07-28_17-25-53_839841_18676/artifacts/2024-07-28_17-26-00/objective_2024-07-28_17-25-49/driver_artifacts
Trial status: 4 PENDING Current time: 2024-07-28 17:26:14. Total running time: 0s Logical resource usage: 4.0/12 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G) ╭────────────────────────────────────────────────╮ │ Trial name status b a │ ├────────────────────────────────────────────────┤ │ objective_b186a_00000 PENDING 3 0.001 │ │ objective_b186a_00001 PENDING 1 0.01 │ │ objective_b186a_00002 PENDING 1 0.1 │ │ objective_b186a_00003 PENDING 2 1 │ ╰────────────────────────────────────────────────╯ (pid=34932) (pid=23860) (pid=34900) (pid=29072)
Trial objective_b186a_00000 started with configuration: ╭──────────────────────────────────────────────╮ │ Trial objective_b186a_00000 config │ ├──────────────────────────────────────────────┤ │ a 0.001 │ │ b 3 │ ╰──────────────────────────────────────────────╯
Traceback (most recent call last): File "D:\Anaconda3\envs\torch2.0_CUDA11.8\Lib\site-packages\tensorboardX\record_writer.py", line 58, in open_file factory = REGISTERED_FACTORIES[prefix]