jr-robotics / robo-gym

An open source toolkit for Distributed Deep Reinforcement Learning on real and simulated robots.
https://sites.google.com/view/robo-gym
MIT License
414 stars 74 forks source link

cannot kill simulation #18

Closed choosungwon closed 3 years ago

choosungwon commented 3 years ago

Thank you for the nice repo i installed the robo-gym standard installation under ubuntu 18.04 and i tried the random simulation ur5-sim like below

import gym, robo_gym

target_machine_ip = 'localhost' # or other machine 'xxx.xxx.xxx.xxx'

env = gym.make('EndEffectorPositioningUR10DoF5Sim-v0', ip=target_machine_ip, gui=True)

env.reset()

n_steps = 100
for count in range(n_steps):

    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)

env.kill_sim()
env.close()

everything fine but env.kill_sim() doen't works

it continuously printing out the following things:

Killing Robot Server at localhost:33901 | Tentative 1
Killing Robot Server at localhost:33901 | Tentative 2
Killing Robot Server at localhost:33901 | Tentative 3
Killing Robot Server at localhost:33901 | Tentative 4
Killing Robot Server at localhost:33901 | Tentative 5
Killing Robot Server at localhost:33901 | Tentative 6
Killing Robot Server at localhost:33901 | Tentative 7
Killing Robot Server at localhost:33901 | Tentative 8
Killing Robot Server at localhost:33901 | Tentative 9
Killing Robot Server at localhost:33901 | Tentative 10
...

it also printing out on console of server manager:

Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
Failed to kill Robot Server 33901
...

Do you have any ideas about this situation?

matteolucchi commented 3 years ago

Hi @choosungwon and thank you for using robo-gym! Sorry for the late reply!

Unfortunately I was not able to reproduce the error you encounter. I ask you to please verify a couple of things:

Cheers, Matteo

choosungwon commented 3 years ago

HI @matteolucchi thanks for your reply

i used latest version of robo-gym-robot-server repository

here is my pip show log

at Python 2.7.17

❯ pip show robo-gym-server-modules
Name: robo-gym-server-modules
Version: 0.2.2
Summary: Robot Servers and Server Manager code for robo-gym
Home-page: https://github.com/jr-robotics/robo-gym-server-modules
Author: Matteo Lucchi, Friedemann Zindler
Author-email: matteo.lucchi@joanneum.at, friedemann.zindler@joanneum.at
License: UNKNOWN
Location: /home/kitechubuntu/.local/lib/python2.7/site-packages
Requires: libtmux, protobuf, grpcio

at Python 3.6.10(pyenv)

❯ pip show robo-gym               
Name: robo-gym
Version: 0.2.0
Summary: robo-gym: an open source toolkit for Distributed Deep Reinforcement Learning on real and simulated robots.
Home-page: https://github.com/jr-robotics/robo-gym
Author: Matteo Lucchi, Friedemann Zindler
Author-email: matteo.lucchi@joanneum.at, friedemann.zindler@joanneum.at
License: UNKNOWN
Location: /home/kitechubuntu/.pyenv/versions/3.6.10/envs/robo-gym/lib/python3.6/site-packages
Requires: gym, robo-gym-server-modules, numpy
Required-by: 
matteolucchi commented 3 years ago

Ok, this seems right.

To get to the bottom of this and see what causes the error we need to temporarily comment some try blocks in robo-gym-server-modules.

  1. Please uninstall the pip package robo-gym-server-modules in both the python environments with pip uninstall robo-gym-server-modules
  2. Clone somewhere on your pc the the robo-gym-server-modules repository
    cd 
    git clone https://github.com/jr-robotics/robo-gym-server-modules.git
  3. Install the pip package in both of the python environments. Activate first one and then the other python environment and:
    cd robo-gym-server-modules
    pip install -e . 
  4. Comment out the try block in KillServer() Replace

    
    def KillServer(self, request, context):
        try:
            assert request.port
            assert self.srv_mngr.kill_session(repr(request.port))
            print ("Robot Server " + repr(request.port) + " killed")
            return server_manager_pb2.RobotServer(success=1)
    
        except:
            print ("Failed to kill Robot Server " + repr(request.port))
            return server_manager_pb2.RobotServer(success=0)
with 
def KillServer(self, request, context): 
    # try:
    assert request.port
    assert self.srv_mngr.kill_session(repr(request.port))
    print ("Robot Server " + repr(request.port) + " killed")
    return server_manager_pb2.RobotServer(success=1)

    # except:
        # print ("Failed to kill Robot Server " + repr(request.port))
        # return server_manager_pb2.RobotServer(success=0)


Please try this out and let me know what is the error output. 

I believe this is something related to the tmux library, we also had some issues with it in the past. 

A workaround is to use `kill-all-robot-servers` in the ServerManager command shell once you are done with using the environment and starting again the ServerManager when you need to start again another environment. 

Cheers, 

Matteo 
choosungwon commented 3 years ago

Here is error output.

at python 3.6.10 :

Killing Robot Server at localhost:33901 | Tentative 1
Killing Robot Server at localhost:33901 | Tentative 2
Killing Robot Server at localhost:33901 | Tentative 3
Killing Robot Server at localhost:33901 | Tentative 4
...
Killing Robot Server at localhost:45679 | Tentative 997
Killing Robot Server at localhost:45679 | Tentative 998
Killing Robot Server at localhost:45679 | Tentative 999
Killing Robot Server at localhost:45679 | Tentative 1000

Traceback (most recent call last):
  File "/home/mypc/notebook/test_1.py", line 25, in <module>
    env.kill_sim()
  File "/home/mypc/.pyenv/versions/robo-gym/lib/python3.6/site-packages/robo_gym/envs/simulation_wrapper.py", line 39, in kill_sim
    assert self.sm_client.kill_server(self.robot_server_ip)
  File "/home/mypc/robo-gym-server-modules/robo_gym_server_modules/server_manager/client.py", line 63, in kill_server
    raise RuntimeError("Failed 5 tentatives of killing Robot Server")
RuntimeError: Failed 5 tentatives of killing Robot Server

at server manager command shell, nothing happen :

Server Manager started at 50100
Robot Server started at 37977 successfully
matteolucchi commented 3 years ago

Ok I think this has to do with tmux not being able to find the session.

There is an issue on that here: https://github.com/tmux-python/libtmux/issues/265#issuecomment-639915895

Please do the following:

Please let me know if this works for you.

Cheers,

Matteo

choosungwon commented 3 years ago

Thank you for your reply

Here is return of python -c "import libtmux ; print(libtmux.Server().list_sessions())" at ServerManager tmux session

❯ python -c "import libtmux ; print(libtmux.Server().list_sessions())" 
[Session($0 0)]

and my default LANG

❯ echo $LANG

en_US.UTF-8
matteolucchi commented 3 years ago

Can you confirm that the output of the script you run:

import gym, robo_gym

target_machine_ip = 'localhost' # or other machine 'xxx.xxx.xxx.xxx'

env = gym.make('EndEffectorPositioningUR10DoF5Sim-v0', ip=target_machine_ip, gui=True)

env.reset()

n_steps = 100 for count in range(n_steps):

action = env.action_space.sample()
obs, reward, done, info = env.step(action)

env.kill_sim() env.close()

Looks something like this:

Starting new Robot Server | Tentative 1
Successfully started Robot Server at localhost:38783
/home/mal/.pyenv/versions/robo-gym/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Killing Robot Server at localhost:38783 | Tentative 1
choosungwon commented 3 years ago

Yes, Here is full output of script:

❯ /home/kitechubuntu/.pyenv/versions/robo-gym/bin/python "/home/kitechubuntu/notebook/code test/test_1.py"
Starting new Robot Server | Tentative 1
Successfully started Robot Server at localhost:53719
/home/kitechubuntu/.pyenv/versions/robo-gym/lib/python3.6/site-packages/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
  warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
Killing Robot Server at localhost:46171 | Tentative 1
Killing Robot Server at localhost:46171 | Tentative 2
Killing Robot Server at localhost:46171 | Tentative 3
Killing Robot Server at localhost:46171 | Tentative 4
...
Killing Robot Server at localhost:46171 | Tentative 995
Killing Robot Server at localhost:46171 | Tentative 996
Killing Robot Server at localhost:46171 | Tentative 997
Killing Robot Server at localhost:46171 | Tentative 998
Killing Robot Server at localhost:46171 | Tentative 999
Killing Robot Server at localhost:46171 | Tentative 1000
Traceback (most recent call last):
  File "/home/kitechubuntu/notebook/code test/test_1.py", line 25, in <module>
    env.kill_sim()
  File "/home/kitechubuntu/.pyenv/versions/robo-gym/lib/python3.6/site-packages/robo_gym/envs/simulation_wrapper.py", line 39, in kill_sim
    assert self.sm_client.kill_server(self.robot_server_ip)
  File "/home/kitechubuntu/robo-gym-server-modules/robo_gym_server_modules/server_manager/client.py", line 63, in kill_server
    raise RuntimeError("Failed 5 tentatives of killing Robot Server")
RuntimeError: Failed 5 tentatives of killing Robot Server
matteolucchi commented 3 years ago

Ok thank you. I am sorry but I ran out of ideas and I still cannot reproduce the error you encounter.

The only thing I can suggest is to use the workaround mentioned before.

A workaround is to use kill-all-robot-servers in the ServerManager command shell once you are done with using the environment and starting again the ServerManager when you need to start again another environment.

If at some point we will be able to reproduce this or we will get some more information I will reopen this.

Cheers,

Matteo