Closed JiyuanTHU closed 7 months ago
Hello, multiprocessing has to use centralized TraCI server generation because there are race conditions when using the conventional means of how to acquire a port from the OS. https://smarts.readthedocs.io/en/latest/ecosystem/sumo.html#centralized-traci-management
Full context of why this is the case is here: #2139
Thanks! This helps
High Level Description
When I use multiprocessing to create multi envs. Sometimes, the following error occurs but not always. The error log is attached. I hope there is someone who can help me with this.
Version
smarts 1.4.0 sumo 1.19.0
Operating System
Ubuntu 20.04
Problems
ERROR:SMARTS:Failed to successfully reset after 1 tries. Process Process-64: Traceback (most recent call last): File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/env/worker/subproc.py", line 96, in _worker obs, info = env.reset(data) File "/home/yuan/Ji_ws/Learning-from-Intervention/utils/env_wrapper.py", line 155, in reset obs, info = self.env.reset(seed=seed, options=options) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/env/gymnasium/wrappers/single_agent.py", line 78, in reset obs, info = self.env.reset(seed=seed, options=options) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/gymnasium/wrappers/order_enforcing.py", line 61, in reset return self.env.reset(*kwargs) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/env/gymnasium/hiway_env_v1.py", line 348, in reset observations = self._smarts.reset( File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 471, in reset raise first_exception File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 464, in reset return self._reset(scenario, start_time) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 501, in _reset self.setup(scenario) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 539, in setup provider_state = self._setup_providers(self._scenario) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 1230, in _setup_providers new_provider_state = self._handle_provider(provider, provider_error) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 1265, in _handle_provider provider_state, recovered = provider.recover( File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 463, in recover raise error File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/smarts.py", line 1228, in _setup_providers new_provider_state = provider.setup(scenario) File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 324, in setup self._initialize_traci_conn() File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/site-packages/smarts/core/sumo_traffic_simulation.py", line 239, in _initialize_traci_conn self._traci_conn.setOrder(0) TypeError: 'NoneType' object is not callable Traceback (most recent call last): File "main_smarts_tianshou_safe.py", line 409, in
test_discrete_sac()
File "main_smarts_tianshou_safe.py", line 209, in test_discrete_sac
result = offpolicy_trainer(
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/trainer/offpolicy.py", line 133, in offpolicy_trainer
return OffpolicyTrainer( args, kwargs).run()
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/trainer/base.py", line 441, in run
deque(self, maxlen=0) # feed the entire iterator into a zero-length deque
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/trainer/base.py", line 315, in next
test_stat, self.stop_fn_flag = self.test_step()
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/trainer/base.py", line 344, in test_step
test_result = test_episode(
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/trainer/utils.py", line 27, in test_episode
result = collector.collect(n_episode=n_episode)
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/data/collector.py", line 344, in collect
self._reset_env_with_ids(
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/data/collector.py", line 174, in _reset_env_with_ids
obs_reset, info = self.env.reset(global_ids, gym_reset_kwargs)
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/env/venvs.py", line 282, in reset
ret_list = [self.workers[i].recv() for i in id]
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/env/venvs.py", line 282, in
ret_list = [self.workers[i].recv() for i in id]
File "/home/yuan/Ji_ws/Learning-from-Intervention/tianshou/env/worker/subproc.py", line 204, in recv
result = self.parent_remote.recv()
File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes
buf = self._recv(4)
File "/home/yuan/anaconda3/envs/safe-smarts/lib/python3.8/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError