huawei-noah / SMARTS

Scalable Multi-Agent RL Training School for Autonomous Driving
MIT License
926 stars 186 forks source link

Early termination of training #1663

Closed zyzhang1130 closed 1 year ago

zyzhang1130 commented 1 year ago

High Level Description Want to determine what is the cause of early termination of the training process

Desired SMARTS version 0.6.1

Operating System Ubuntu 20.04.5 LTS

Problems the following occurred during my training process: ...

Agent_0: Collided.
Agent_0: Went off road.
Agent_0: Collided.
Agent_0: Went off road.
------------------------------------
| rollout/            |            |
|    ep_len_mean      | 11.4       |
|    ep_rew_mean      | -18.897657 |
|    exploration_rate | 0.638      |
|    success_rate     | 0          |
| time/               |            |
|    episodes         | 284        |
|    fps              | 10         |
|    time_elapsed     | 375        |
|    total_timesteps  | 3808       |
------------------------------------
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
-----------------------------------
| rollout/            |           |
|    ep_len_mean      | 11.3      |
|    ep_rew_mean      | -18.73835 |
|    exploration_rate | 0.634     |
|    success_rate     | 0         |
| time/               |           |
|    episodes         | 288       |
|    fps              | 10        |
|    time_elapsed     | 381       |
|    total_timesteps  | 3851      |
-----------------------------------
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Killed

It happened to more than one algo (i.e. DQN, PPO, A2C) Moreover, the duration that the training process was terminated was not consistent (sometimes after starting for 1 hr, sometimes a few hours, other times a few days). I wonder what is the cause of this phenomenon.

Adaickalavan commented 1 year ago

Hi @zyzhang1130,

Could you share a minimum working example to reproduce the error?

zyzhang1130 commented 1 year ago

the front part was just running normally:

(base) zeyu@ZYNitro5:~$ conda activate smarts
(smarts) zeyu@ZYNitro5:~$ cd SMARTS/competition/track1/train
(smarts) zeyu@ZYNitro5:~/SMARTS/competition/track1/train$ python3.8 train.py
python3.8: can't open file 'train.py': [Errno 2] No such file or directory
(smarts) zeyu@ZYNitro5:~/SMARTS/competition/track1/train$ python3.8 run.py

Torch cuda is available:  True

Logdir: /home/zeyu/SMARTS/competition/track1/train/logs/2022_10_12_15_23_10

/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/intersection/1_to_2lane_left_turn_c/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/gym/logger.py:34: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize("%s: %s" % ("WARN", msg % args), "yellow"))
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/intersection/1_to_2lane_left_turn_c/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/intersection/1_to_2lane_left_turn_t/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/intersection/1_to_2lane_left_turn_t/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/merge/3lane_single_agent/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/merge/3lane_single_agent/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/straight/3lane_cruise_single_agent/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/straight/3lane_cruise_single_agent/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/straight/3lane_overtake/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix
Waiting on /home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/scenarios/straight/3lane_overtake/. ...
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/trimesh/curvature.py:12: DeprecationWarning: Please use `coo_matrix` from the `scipy.sparse` namespace, the `scipy.sparse.coo` namespace is deprecated.
  from scipy.sparse.coo import coo_matrix

Start training.

Using cuda device

Training on 1_to_2lane_left_turn_t.

 Retrying in 0.05 seconds
:device(error): Error adding inotify watch on /dev/input: No such file or directory
:device(error): Error opening directory /dev/input: No such file or directory
2022-10-12 15:29:47.891268: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
<frozen importlib._bootstrap>:219: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 80 from C header, got 96 from PyObject
/home/zeyu/anaconda3/envs/smarts/lib/python3.8/site-packages/h5py/__init__.py:46: DeprecationWarning: `np.typeDict` is a deprecated alias for `np.sctypeDict`.
  from ._conv import register_converters as _register_converters
Logging to /home/zeyu/SMARTS/competition/track1/train/logs/2022_10_12_15_23_10/tensorboard/PPO_1
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Collided.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Collided.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Collided.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
-----------------------------------
| rollout/           |            |
|    ep_len_mean     | 10.7       |
|    ep_rew_mean     | -34.955578 |
| time/              |            |
|    fps             | 14         |
|    iterations      | 1          |
|    time_elapsed    | 141        |
|    total_timesteps | 2048       |
-----------------------------------
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went wrong way.
Agent_0: Went off road.
Agent_0: Went off road.
Agent_0: Went off road.

... until the error showed up in my previous comment.

Adaickalavan commented 1 year ago

1) Were any changes made to the given example at SMARTS/competition/track1? 2) If yes, what were the code changes?

zyzhang1130 commented 1 year ago

Does changing hyperparameters in SMARTS/competition/track1/train/config.yaml count? Like changing epochs train_steps (I changed it from the default value to the following.

# Training
  epochs: 5_00 # Number of training loops.

  # Training per scenario
  train_steps: 100_00
  checkpoint_freq: 50_00 # Save a model every checkpoint_freq calls to env.step().
  eval_eps: 2 # Number of evaluation epsiodes.
  eval_freq: 50_00 # Evaluate the trained model every eval_freq steps and save the best model.
Adaickalavan commented 1 year ago
  1. Changing the number of epochs and training steps should not cause issues.
  2. From the log, it is seen that the training process was killed. Most probable reasons include out of memory or out of file descriptors.
  3. Try investigating whether the memory consumption by your program is increasing over time.
Adaickalavan commented 1 year ago

This issue appears stale and thus it is being closed. Please feel free to reopen this issue if your problem persists or open a new issue for other questions.