Open rl-2 opened 2 years ago
Hi Rodger, please try the new version and let me know if the issue persists. Thanks.
Hi Cheng, it seems the issue is still there. Here is a full log:
Error in client-server communication: [Errno 111] Connection refused
Process ForkServerProcess-20:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 24, in _worker
env = env_fn_wrapper.var()
File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Utils/utils.py", line 64, in _init
max_attempts_per_level=max_attempts_per_level)
File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/SBEnvironment/SBEnvironmentWrapperOpenAI.py", line 78, in __init__
self.connect_agent_to_server()
File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/SBEnvironment/SBEnvironmentWrapperOpenAI.py", line 88, in connect_agent_to_server
self.ar.configure(self.env_id)
File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Client/agent_client.py", line 171, in configure
self.playing_mode.value
File "/home/ubuntu/RL-AngryBirds/sciencebirdsagents/Client/agent_client.py", line 131, in _send_command
self.server_socket.sendall(msg)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
File "OpenAI_StableBaseline_Train.py", line 231, in <module>
range(c.num_worker)])
File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 111, in __init__
observation_space, action_space = self.remotes[0].recv()
File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
To follow up on this issue, I initialized the game server before running the script and I got the similar issue:
021-11-30 00:57:35,012 - OpenAI stable baselines Training and Testing - INFO - training step: 0
Server started...
Error in client-server communication: [Errno 111] Connection refused
On the server side, it seems it has been killed automatically:
The Science Birds Server is waiting for the first agent to connect
Waiting for agent
Killed
Hi Rodger, the problem should still be that the game server is not successfully initialised. Can you provide the exact environment you are using so that we can replicate the issue? Thanks.
Thanks, Cheng. Below is the environments info:
And the steps I've taken are:
java -jar ./game_playing_interface.jar
and the terminal shows:
The Science Birds Server is waiting for the first agent to connect
Waiting for agent
./TrainAndTestOpenAIStableBaselines.sh within_template
. Then I got the errors shown in this thread. Hi Luo, I have updated a version. The new version will open a new terminal window to run the server. Please let me know if the problem still exist. Cheers.
Hi Cheng,
Thanks a ton for the update! I saw this error when I run the code:
sh: 1: gnome-terminal: not found
Note that I'm running the code on an AWS instance. I'm not sure it prevents launching a new terminal window?
Hi Rodger, it is a bit tricky to run on AWS, although we did our test on AWS as well, it only supports 'symbolic' mode atm. The initial version (you can activate it by setting self.headless_server = True
at line 10 in Server.py
.
Can you please verify if the following code can successfully run start the server?
bash -c "cd ../sciencebirdsgames/Linux && nohup java -jar ./game_playing_interface.jar --headless --dev > out 2>&1 &"
I also have a question regarding server.py
.
You used 3 conditions; self.if_head
, self.headless_server
, self.state_repr_type
.
--dev > out 2>&1
option is added in line 22, 33, 43, 52 (when self.headless_server==True
).
Isn't this option correspond to self.state_repr_type
?
--headless
option is added in line 22, 27, 43, 47 (when self.if_head==False
and self.state_repr_type=='symbolic
or when self.if_head=='headless'
).
This obviously looks like wrong code, since you didn't add self.state_repr_type
condition later on (i.e. elif and else).
Also, I don't get why you added similarly functioning conditions self.if_head
and self.headless_server
.
Can you explain me about this?
I also have a question regarding
server.py
. You used 3 conditions;self.if_head
,self.headless_server
,self.state_repr_type
.
--dev > out 2>&1
option is added in line 22, 33, 43, 52 (whenself.headless_server==True
). Isn't this option correspond toself.state_repr_type
?--headless
option is added in line 22, 27, 43, 47 (whenself.if_head==False
andself.state_repr_type=='symbolic
or whenself.if_head=='headless'
). This obviously looks like wrong code, since you didn't addself.state_repr_type
condition later on (i.e. elif and else). Also, I don't get why you added similarly functioning conditionsself.if_head
andself.headless_server
. Can you explain me about this?
Hi Hawe,
Apologies for the delay in getting back to you.
Regarding your questions:
The addition of --dev > out 2>&1 corresponds to the use of symbolic states. But when the image representation is used, the agent will not read from the symbolic states, so adding --dev will not alter the result.
When self.state_repr_type == "symbolic", the agent requests symbolic state representation from the server. The presence of --dev ensures accurate information retrieval. Conversely, when self.state_repr_type != "symbolic", the agent doesn't engage with symbolic representation and requests only the images.
Regarding the presence of both self.headless_server and self.if_head, it was an issue during our code refactoring. We are planning to integrate the Java server directly into Unity for improved usability without additional configurations. We're committed to addressing these concerns and improving code readability in our next release.
Please let me know if you have future questions or would like more clarifications.
Cheers, Cheng
Hello,
I'm trying to train a PPO agent with Stable Baselines, followed by the instructions on Sec 5.2.2. After running
./TrainAndTestOpenAIStableBaselines.sh within_template
, I got the following error:I wonder if I miss a step to activate the ScienceBird application? Please let me know.
Thank you!