huawei-noah / SMARTS

Scalable Multi-Agent RL Training School for Autonomous Driving
MIT License
925 stars 186 forks source link

The simulation scene is terminated before the set condition #657

Open guanjiayi opened 3 years ago

guanjiayi commented 3 years ago

High Level Description [I want the simulation scene will terminate after the termination conditions are met]

Desired SMARTS version [e.g. 0.4.12]

Operating System [ Ubuntu 18.04]

Problems [1. I set the training episode_num =1000, but the simulation scene is terminated at episode=(8~17) and the sum steps is fixed (7763) ] [2. If i change and build the loop scene, and it is terminated in different sum steps, but if without any changes about the scenario the simulation scene it terminated in the same sum steps ]

Gamenot commented 3 years ago

@guanjiayi Hello, thank you for the question. I need more information so that I can answer you.

Are you using rllib.py, or benchmark, or ultra to try and train? Or did you create your own training?

Are you seeing any errors when the simulation exits?

If you are using rllib.py the scenario could potentially end because the training crashes: and there will be an error log in "~/ray_results/rllib_example_multi/**/error.txt". If you are using rllib.py please provide that error text.

guanjiayi commented 3 years ago

@Gamenot Hello, Thank you for your reply.

I Create my own training, but it base on the single_agent.py. The error still appeared when I check this problem through example/single_agent.py and just set its episodes=1000.

I use the loop scenario , The error have been mentioned that "ERROR: SMARTS: Simulation crashed with exception. Attempting to cleanly shutdown. ERROR:SMARTS:connection by SUMO"

guanjiayi commented 3 years ago

The Error mentioned: ERROR:SMARTS:Simulation crashed with exception. Attempting to cleanly shutdown. ERROR:SMARTS:connection closed by SUMO Traceback (most recent call last): File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 170, in step return self._step(agent_actions) File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 219, in _step provider_state = self._step_providers(all_agent_actions, dt) File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 695, in _step_providers provider, actions, dt, self._elapsed_sim_time File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 734, in _step_provider provider_state = provider.step(provider_actions, dt, elapsed_sim_time) File "/home/jiayiguan/SMARTS/smarts/core/sumo_traffic_simulation.py", line 310, in step self._traci_conn.simulationStep(self._cumulative_sim_seconds) File "/usr/share/sumo/tools/traci/connection.py", line 302, in simulationStep result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step) File "/usr/share/sumo/tools/traci/connection.py", line 180, in _sendCmd return self._sendExact() File "/usr/share/sumo/tools/traci/connection.py", line 90, in _sendExact raise FatalTraCIError("connection closed by SUMO") traci.exceptions.FatalTraCIError: connection closed by SUMO ╰────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴────────────────────╯ Traceback (most recent call last): File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 170, in step return self._step(agent_actions) File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 219, in _step provider_state = self._step_providers(all_agent_actions, dt) File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 695, in _step_providers provider, actions, dt, self._elapsed_sim_time File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 734, in _step_provider provider_state = provider.step(provider_actions, dt, elapsed_sim_time) File "/home/jiayiguan/SMARTS/smarts/core/sumo_traffic_simulation.py", line 310, in step self._traci_conn.simulationStep(self._cumulative_sim_seconds) File "/usr/share/sumo/tools/traci/connection.py", line 302, in simulationStep result = self._sendCmd(tc.CMD_SIMSTEP, None, None, "D", step) File "/usr/share/sumo/tools/traci/connection.py", line 180, in _sendCmd return self._sendExact() File "/usr/share/sumo/tools/traci/connection.py", line 90, in _sendExact raise FatalTraCIError("connection closed by SUMO") traci.exceptions.FatalTraCIError: connection closed by SUMO

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "rlagentmodel/sacagent_dis/sacagent_dis20210310.py", line 669, in epochs=args.epochs, File "rlagentmodel/sacagent_dis/sacagent_dis20210310.py", line 598, in main observations, rewards, dones, infos = env.step({AGENT_ID:agent_action})
File "/home/jiayiguan/SMARTS/smarts/env/hiway_env.py", line 160, in step observations, rewards, agent_dones, extras = self._smarts.step(agent_actions) File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 181, in step self.destroy() File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 460, in destroy self._traffic_sim.destroy() File "/home/jiayiguan/SMARTS/smarts/core/sumo_traffic_simulation.py", line 110, in destroy self._close_traci_and_pipes() File "/home/jiayiguan/SMARTS/smarts/core/sumo_traffic_simulation.py", line 270, in _close_traci_and_pipes self._traci_conn.close() File "/usr/share/sumo/tools/traci/connection.py", line 369, in close if self._socket is not None: AttributeError: 'Connection' object has no attribute '_socket' Assertion failed: !is_empty() at line 2340 of panda/src/pgraph/nodePath.cxx Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/home/jiayiguan/anaconda3/envs/smarts/lib/python3.7/site-packages/direct/showbase/ShowBase.py", line 82, in exitfunc builtins.base.destroy() File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 451, in destroy self.teardown() File "/home/jiayiguan/SMARTS/smarts/core/smarts.py", line 434, in teardown self._root_np.clearLight() AssertionError: !is_empty() at line 2340 of panda/src/pgraph/nodePath.cxx /home/jiayiguan/anaconda3/envs/smarts/lib/python3.7/tempfile.py:798: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpuv7_ey91wandb'> _warnings.warn(warn_message, ResourceWarning) /home/jiayiguan/anaconda3/envs/smarts/lib/python3.7/tempfile.py:798: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp4vccvsv3wandb-media'> _warnings.warn(warn_message, ResourceWarning) /home/jiayiguan/anaconda3/envs/smarts/lib/python3.7/tempfile.py:798: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpdnk9cmvwwandb-media'> _warnings.warn(warn_message, ResourceWarning)

Gamenot commented 3 years ago

@guanjiayi We are really sorry about that. This exception is a difficult bug to reproduce that is related to SUMO and we currently have a branch going to pass that crash along to the SUMO team https://github.com/huawei-noah/SMARTS/pull/619.

I would suggest for now to remove the bubble in the loop scenario or use a different map since the crash occurs with some frequency on the loop map.

Adaickalavan commented 3 years ago

Hi @guanjiayi,

It appears that you encouter the traci.exceptions.FatalTraCIError: connection closed by SUMO error.

Given the occurrence of traci.exceptions.FatalTraCIError: connection closed by SUMO error, could you try running all your commands and experiments inside a docker container?

$ docker run --rm -it --network=host huaweinoah/smarts:v0.4.13

Do not map the source code using -v $PWD:/src when running the docker container.

guanjiayi commented 3 years ago

@Adaickalavan Thank you for your reply. I check the problem in difference way. A difference way:

  1. I install our smarts without install other package, this problem did't appear when test the single_agent.py.
  2. Then i install the openai reinforcement learning package and the wandb package, this problem have been appeared.
  3. Now I change the install sequence, our smarts package is installed after the openai reinforcement learning package and without install the wandb package. ( I still check the test)

Use docker

  1. The sumo-gui can't be open in the docker.
guanjiayi commented 3 years ago

Open AI reinforcement learning package https://spinningup.openai.com/en/latest/user/installation.html

Gamenot commented 3 years ago

@guanjiayi OK, I will take a look today.

guanjiayi commented 3 years ago

@Gamenot Thank you, The "connection closed by SUMO" didn't appear, when the running the example in the docker after install the spinningup package.

guanjiayi commented 3 years ago

@Gamenot Thank you for your help!

Gamenot commented 3 years ago

The progress on this is that we passed the bug along to SUMO via a reproducible crash we can generate in #619.