Closed richardliaw closed 3 years ago
The second one will have:
2021-05-10 18:02:27,651 INFO elastic.py:156 -- Actor status: 2 alive, 0 dead (2 total)
Traceback (most recent call last):
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/xgboost_ray/main.py", line 957, in _train
ray.get(ready)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
return func(*args, **kwargs)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/ray/worker.py", line 1481, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RayXGBoostTrainingError): ray::RayXGBoostActor.train() (pid=92250, ip=192.168.1.115)
File "python/ray/_raylet.pyx", line 505, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task.function_executor
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/ray/_private/function_manager.py", line 556, in actor_method_executor
return method(__ray_actor, *args, **kwargs)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/xgboost_ray/main.py", line 554, in train
raise RayXGBoostTrainingError("Training failed.")
xgboost_ray.main.RayXGBoostTrainingError: Training failed.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/xgboost_ray/main.py", line 1248, in train
bst, train_evals_result, train_additional_results = _train(
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/xgboost_ray/main.py", line 983, in _train
raise RayActorError from exc
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scripts/typer_run_pipeline.py", line 23, in <module>
typer.run(main)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/typer/main.py", line 859, in run
app()
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/typer/main.py", line 214, in __call__
return get_command(self)(*args, **kwargs)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "scripts/typer_run_pipeline.py", line 19, in main
train_model(processed)
File "/Users/rliaw/dev/ray-summit-demo-2021/summit_2021/train.py", line 22, in train_model
bst = xgbr.train(
File "/Users/rliaw/miniconda3/envs/demo/lib/python3.8/site-packages/xgboost_ray/main.py", line 1319, in train
raise RuntimeError(
RuntimeError: A Ray actor died during training and the maximum number of retries (0) is exhausted.
Compare:
to the Ray version
The first one will show:
But the second one will swallow this error.