huawei-noah / SMARTS

Scalable Multi-Agent RL Training School for Autonomous Driving
MIT License
954 stars 190 forks source link

[Bug Report] smarts stuck issue (follow up of #2089) #2091

Open Edward11235 opened 1 year ago

Edward11235 commented 1 year ago

High Level Description

I am using SMARTS 1.2.1 (new verison) to train a model. After training for a long time, SMARTS will always freeze. Below is the error message when I terminate the training:

^CTraceback (most recent call last): File "train.py", line 748, in sac(ppo_iters, env, agent, None, current_constraint_function, condition, command) File "train.py", line 523, in sac S, A, R, C, log_prob = play_episode(env, policy_nn, constraint_fn) File "train.py", line 326, in play_episode env.reset() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/env/hiway_env.py", line 307, in reset observations = self._smarts.reset(scenario) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 454, in reset return self._reset(scenario, start_time) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 489, in _reset self.teardown() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 872, in teardown self._teardown_providers() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 1226, in _teardown_providers provider.teardown() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 429, in teardown self._remove_vehicles() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 412, in _remove_vehicles self._traci_conn.vehicle.remove(vehicle_id) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/utils/sumo.py", line 236, in _wrap_traci_method return method(*args, *kwargs) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/_vehicle.py", line 1474, in remove self._setCmd(tc.REMOVE, vehID, "b", reason) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/domain.py", line 164, in _setCmd self._connection._sendCmd(self._cmdSetID, varID, objectID, format, values) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 231, in _sendCmd return self._sendExact() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 131, in _sendExact result = self._recvExact() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 109, in _recvExact t = self._socket.recv(4 - len(result)) KeyboardInterrupt Epoch 16: G_avg = 14.83 Gc_avg = 21.91: 3%| | 17/500 [65:09:41<1851:21:23, Exception ignored in: <function SafeBulletClient.del at 0x7fc47ca98430> Traceback (most recent call last): File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/utils/pybullet.py", line 52, in del File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/pybullet_utils/bullet_client.py", line 43, in del TypeError: catching classes that do not inherit from BaseException is not allowed

Version

I used v1.2.1

Steps to reproduce the bug

Running the SMARTS for many episodes for a long time will reproduce the bug.

System info

System info: Ubuntu 20.04 Python 3.8

Date: 2023-10.1

Error logs and screenshots

^CTraceback (most recent call last): File "train.py", line 748, in sac(ppo_iters, env, agent, None, current_constraint_function, condition, command) File "train.py", line 523, in sac S, A, R, C, log_prob = play_episode(env, policy_nn, constraint_fn) File "train.py", line 326, in play_episode env.reset() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/env/hiway_env.py", line 307, in reset observations = self._smarts.reset(scenario) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 454, in reset return self._reset(scenario, start_time) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 489, in _reset self.teardown() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 872, in teardown self._teardown_providers() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 1226, in _teardown_providers provider.teardown() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 429, in teardown self._remove_vehicles() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 412, in _remove_vehicles self._traci_conn.vehicle.remove(vehicle_id) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/utils/sumo.py", line 236, in _wrap_traci_method return method(*args, *kwargs) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/_vehicle.py", line 1474, in remove self._setCmd(tc.REMOVE, vehID, "b", reason) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/domain.py", line 164, in _setCmd self._connection._sendCmd(self._cmdSetID, varID, objectID, format, values) File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 231, in _sendCmd return self._sendExact() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 131, in _sendExact result = self._recvExact() File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/sumo/tools/traci/connection.py", line 109, in _recvExact t = self._socket.recv(4 - len(result)) KeyboardInterrupt Epoch 16: G_avg = 14.83 Gc_avg = 21.91: 3%| | 17/500 [65:09:41<1851:21:23, Exception ignored in: <function SafeBulletClient.del at 0x7fc47ca98430> Traceback (most recent call last): File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/smarts/core/utils/pybullet.py", line 52, in del File "/home/edward/anaconda3/envs/smarts/lib/python3.8/site-packages/pybullet_utils/bullet_client.py", line 43, in del TypeError: catching classes that do not inherit from BaseException is not allowed

Impact (If known)

This bug will hinder training large models with SMARTS.