jr-robotics / robo-gym

An open source toolkit for Distributed Deep Reinforcement Learning on real and simulated robots.
https://sites.google.com/view/robo-gym
MIT License
390 stars 74 forks source link

Failed to connect to Server Manager #37

Closed psFournier closed 2 years ago

psFournier commented 2 years ago

Hello, First, thank you for the awesome work, this lib seems to be exactly what I was looking for :)

As for my problem: I work with ubuntu 20.04, ROS Noetic, Python 3.8 I have installed robo-gym-robot-servers with no apparent problem and roslaunch mir100_robot_server sim_robot_server.launch gui:=true seems to work as expected with rviz. I have installed robo-gym-server-modules with no apparent problem either: start-server-manager && attach-to-server-manager except the weird first line: E0712 17:49:47.489397845 432673 fork_posix.cc:63] Fork support is only compatible with the epoll1 and poll polling strategies 2021-07-12 17:49:47,493 - serverManager - INFO - Server Manager started at 50100

I did sudo sh -c 'printf "127.0.0.1 robot-servers" >> /etc/hosts' as I intend to run robo-gym on the same machine I started the server.

Finally, I have cloned the robo-gym repository, and installed it in a virtual environment with venv/bin/pip install -e robo-gym/.

Now I start a working session with : start-server-manager && attach-to-server-manager I open another terminal window to test in my virtualenv, and both pytest and python docs/examples/random_agent_sim.py return

Traceback (most recent call last): File "docs/examples/random_agent_sim.py", line 8, in env = gym.make('NoObstacleNavigationMir100Sim-v0', ip=target_machine_ip, gui=True) File "/d/pfournie/.local/lib/python3.8/site-packages/gym/envs/registration.py", line 145, in make return registry.make(id, kwargs) File "/d/pfournie/.local/lib/python3.8/site-packages/gym/envs/registration.py", line 90, in make env = spec.make(kwargs) File "/d/pfournie/.local/lib/python3.8/site-packages/gym/envs/registration.py", line 60, in make env = cls(_kwargs) File "/d/pfournie/Documents/paradis/robo-gym/robo_gym/envs/mir100/mir100.py", line 411, in init Simulation.init(self, self.cmd, ip, lower_bound_port, upper_bound_port, gui, kwargs) File "/d/pfournie/Documents/paradis/robo-gym/robo_gym/envs/simulation_wrapper.py", line 25, in init self.sm_client = sm_client.Client(ip) File "/d/pfournie/.local/lib/python3.8/site-packages/robo_gym_server_modules/server_manager/client.py", line 12, in init assert self._connect_to_rl_srv_mng(ip,lower_bound_port,upper_bound_port) File "/d/pfournie/.local/lib/python3.8/site-packages/robo_gym_server_modules/server_manager/client.py", line 83, in _connect_to_rl_srv_mng raise RuntimeError('Failed to connect to Server Manager') RuntimeError: Failed to connect to Server Manager

Do you have any idea what went wrong in the process ? I tried to work outside of the virtualenv just in case by installing robo-gym globally, but the problem persists. Is it possible some configuration of my ports can prevent the env to communicate with the server manager ?

Thank you in advance ! Pierre

matteolucchi commented 2 years ago

Hi @psFournier ! Thank you for your kind words, it is very much appreciated!

Are you starting the Server Manager and running the random_agent_sim.py on the same machine or are you using 2 different computers?

What output do you get from:

ping 127.0.0.1

Cheers, Matteo

psFournier commented 2 years ago

Hi! Thank you for the (very) quick answer :)

I am running the agent and the server manager on the same machine. ping 127.0.0.1 outputs: PING 127.0.01 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.037 ms 64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.018 ms 64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.045 ms

Additional context: I ran the installation process on my personal laptop (ubuntu 20, noetic, python 3.8) and everything went fine (agent and server manager on the same machine). The machine on which I encounter the problem has a different configuration in two ways: some ports usage may be constrained, and during the install, I had to switch between my standard user account and an admin account for commands requiring it (I doubt this last part should have an impact...). Regarding ports, maybe there is some info I could check ?

Cheers,

matteolucchi commented 2 years ago

some ports usage may be constrained

Do you have a way of finding out which port ranges are restricted?

I would try to run the basic example directly from the gRPC documentation to check whether it is a problem related to the network setup or something that has to do with robo-gym itself. Please try to run the example and let me know if it works.

psFournier commented 2 years ago

I have run the full basic example from the grpc doc without any issue ! In the meantime I asked if I could have some info on port restrictions...

matteolucchi commented 2 years ago

Ok, it is good to know that the example works.

Let's try the connection to the Server Manager: start the Server Manager as usual and then try the following in python:

import robo_gym_server_modules.server_manager.client as sm_client
client = sm_client.Client('127.0.0.1')
client._verify_connection()
psFournier commented 2 years ago

client = sm_client.Client('127.0.01') fails with

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../robo-gym/venv/lib/python3.8/site-packages/robo_gym_server_modules/server_manager/client.py", line 12, in __init__
    assert self._connect_to_rl_srv_mng(ip,lower_bound_port,upper_bound_port)
  File ".../robo-gym/venv/lib/python3.8/site-packages/robo_gym_server_modules/server_manager/client.py", line 83, in _connect_to_rl_srv_mng
    raise RuntimeError('Failed to connect to Server Manager')
RuntimeError: Failed to connect to Server Manager
psFournier commented 2 years ago

I am considering doing a clean re-install of everything, it's really weird it doesn't work on this machine and worked so well on my laptop with the same environment, and I would not like to take too much of your time for something silly... I'll keep you updated in the afternoon !

psFournier commented 2 years ago

Well, instantiating a Client from robo_gym_server_modules.server_manager.client still does not work, and actually there does not seem to be any restrictions on ports...

matteolucchi commented 2 years ago

Can you try to upgrade grpc to latest version?

python -m pip install --upgrade grpcio

I found this related gRPC issue that was fixed, so maybe you had already installed an old version of gRPC. This would still not explain why the example worked but it is worth a try.

psFournier commented 2 years ago

I found the culprit !

The rpc_error line 82 of robo_gym_server_modules.server_manager.client indicated a proxy related problem

debug_error_string = "{"created":"@1626176192.310865438","description":"Failed to create subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":2720,"referenced_errors":[{"created":"@1626176192.310859177","description":"Pick Cancelled","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":240,"referenced_errors":[{"created":"@1626176192.310821957","description":"Connect Failed","file":"src/core/ext/filters/client_channel/subchannel.cc","file_line":688,"grpc_status":14,"referenced_errors":[{"created":"@1626176192.310749951","description":"HTTP proxy returned response code 403","file":"src/core/ext/filters/client_channel/http_connect_handshaker.cc","file_line":211}]}]}]}"

I had a no_proxy="localhost" but it seems that grpc does not properly makes the connection between localhost and 127.0.0.1 ! Adding 127.0.0.1 to the no_proxy env variable solved my issue.

Hope it can help someone some time... Again, thank you so much for your time and for the lib !

matteolucchi commented 2 years ago

Nice ! I am glad to see that you found the problem and thank you for reporting back the solution! I am sure it will be helpful for someone else!

Thank you and I hope you will find robo-gym useful!

Cheers, Matteo