huawei-noah / SMARTS

Scalable Multi-Agent RL Training School for Autonomous Driving
MIT License
958 stars 190 forks source link

Time Error #1672

Closed knightcalvert closed 2 years ago

knightcalvert commented 2 years ago

When I try to run the given example scl run --envision examples/single_agent.py scenarios/sumo/loop, the error smarts.core.remote_agent.RemoteAgentException: Timeout while connecting to remote worker process. was throwed out. The whole output looks like that

`╭────────────────────┬────────────────────┬────────────────────┬────────────────────┬────────────────────┬────────────────────┬────────────────────┬────────────────────╮ │ Episode │ Sim T / Wall T │ Total Steps │ Steps / Sec │ Scenario Map │ Scenario Routes │ Mission (Hash) │ Scores │ ├────────────────────┼────────────────────┼────────────────────┼────────────────────┼────────────────────┼────────────────────┼────────────────────┼────────────────────┤ Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds Could not connect to TraCI server at localhost:53265 [Errno 111] Connection refused Retrying in 0.05 seconds │ 0/10 │ 0.49 │ 105 │ 4.91 │ loop │ basic.rou.xml │ -55557748305394795 │ 97.65 - SingleAgent │ ERROR:SMARTS:Simulation crashed with exception. Attempting to cleanly shutdown. ERROR:SMARTS:Failed to acquire remote agent. Traceback (most recent call last): File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/smarts.py", line 216, in step return self._step(agent_actions, time_delta_since_last_step) File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/smarts.py", line 279, in _step self._bubble_manager.step(self) File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/bubble_manager.py", line 397, in step self._handle_transitions(sim, self._cursors) File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/bubble_manager.py", line 448, in _handle_transitions self._airlock_social_vehicle_with_social_agent( File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/bubble_manager.py", line 515, in _airlock_social_vehicle_with_social_agent self._start_social_agent( File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/bubble_manager.py", line 613, in _start_social_agent sim.agent_manager.start_social_agent( File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/agent_manager.py", line 517, in start_social_agent remote_agent = self._remote_agent_buffer.acquire_remote_agent() File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/remote_agent_buffer.py", line 222, in acquire_remote_agent raise RemoteAgentException("Failed to acquire remote agent.") smarts.core.remote_agent.RemoteAgentException: Failed to acquire remote agent. ERROR:RemoteAgentBuffer:Exception while tearing down buffered remote agent. RemoteAgentException('Timeout while connecting to remote worker process.') ╰────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────╯ Traceback (most recent call last): File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/remote_agent.py", line 73, in init grpc.channel_ready_future(self._worker_channel).result(timeout=timeout) File "/mnt/e/python/marl/carla_project/SMARTS/.venv/lib/python3.8/site-packages/grpc/_utilities.py", line 140, in result self._block(timeout) File "/mnt/e/python/marl/carla_project/SMARTS/.venv/lib/python3.8/site-packages/grpc/_utilities.py", line 86, in _block raise grpc.FutureTimeoutError() grpc.FutureTimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "examples/single_agent.py", line 81, in main( File "examples/single_agent.py", line 64, in main observation, reward, done, info = env.step(agent_action) File "/mnt/e/python/marl/carla_project/SMARTS/smarts/env/wrappers/single_agent.py", line 60, in step obs, reward, done, info = self.env.step({self._agent_id: action}) File "/mnt/e/python/marl/carla_project/SMARTS/smarts/env/hiway_env.py", line 245, in step observations, rewards, dones, extras = self._smarts.step(agent_actions) File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/smarts.py", line 227, in step self.destroy() File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/smarts.py", line 703, in destroy self._agent_manager.destroy() File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/agent_manager.py", line 82, in destroy self._remote_agent_buffer.destroy() File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/remote_agent_buffer.py", line 127, in destroy raise e File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/remote_agent_buffer.py", line 121, in destroy remote_agent = remote_agent_future.result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.get_result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/remote_agent_buffer.py", line 158, in _build_remote_agent return RemoteAgent( File "/mnt/e/python/marl/carla_project/SMARTS/smarts/core/remote_agent.py", line 75, in init raise RemoteAgentException( smarts.core.remote_agent.RemoteAgentException: Timeout while connecting to remote worker process.`

I tried some suggestions in the other issues, but it doesn't works. And when I make sanity-test, it showed I do not set the sumo_home path, I don't know whether this warning is related to the time error

Operating System wsl2

Gamenot commented 2 years ago

Hello @knightcalvert, could you confirm which version of SMARTS you are using?

knightcalvert commented 2 years ago

Hello @knightcalvert, could you confirm which version of SMARTS you are using?

Thanks for your reply, I don't know where I can find the version, but I used the smarts cloned from the main branch

Gamenot commented 2 years ago

Hello @knightcalvert, this has been a bit of an issue with WSL before. The way that WSL handles acquiring sockets makes the setup of those socket slow and so they used to time out. We have currently fixed this problem in the develop branch (git checkout develop) so you could try that for now.

Otherwise, we will be moving a pre-release to the main branch by the end of this week.

knightcalvert commented 2 years ago

Thank you so much, I tried to use the develop branch, but it didn't seems to work well, maybe I did something wrong in some steps, and I started this project in another machine with linux in vmware, everything is ok. I'll try the new version next week in my computer. Have a nice day !

Gamenot commented 2 years ago

@knightcalvert On a different note, are you intending to try communication between SMARTS and CARLA?

We have some possibility to do that since CARLA supports co-simulation with SUMO. https://carla.readthedocs.io/en/latest/adv_sumo/

SUMO has an option to have multiple clients so long as setOrder is called for each of the clients before the first step: https://sumo.dlr.de/docs/TraCI.html#multiple_clients

This seems to be configurable for Carla through --client-order in their example: https://github.com/carla-simulator/carla/blob/0c41f167cf1b3f33e23e3be6fb5e1c9552ba5969/Co-Simulation/Sumo/run_synchronization.py#L288

SMARTS also does co-simulation and allows being one of multiple clients although currently it only supports being the responsible party for launching SUMO: https://github.com/huawei-noah/SMARTS/blob/49ece7157bb338570b0e2620a7cd4b376a986821/smarts/core/sumo_traffic_simulation.py#L67-L68