benchbot_run fails with internal server error

gmuraleekrishna commented 2 years ago

Hello, Benchbot version: 2.3.1

I was trying to run benchbot_run --robot carter_omni --env office:1 --task semantic_slam:passive:ground_truth but the command fails with following error

Supervisor is now available @ 'http://0.0.0.0:10000' ...

Waiting until a robot controller is found @ 'http://benchbot_robot:10000' ... 
    Found
Sending environment data & robot config to controller ... 
    Ready
Starting the robot controller ... 
    Ready

172.20.0.254 - - [2022-04-20 17:39:57] "GET // HTTP/1.1" 200 146 0.001507
172.20.0.254 - - [2022-04-20 17:40:01] "GET /robot/is_running HTTP/1.1" 200 128 0.517667
172.20.0.254 - - [2022-04-20 17:40:01] "GET /config/robot HTTP/1.1" 200 2637 0.001666
172.20.0.254 - - [2022-04-20 17:40:01] "GET /robot/selected_environment HTTP/1.1" 200 149 0.007950
172.20.0.254 - - [2022-04-20 17:40:01] "GET /robot/is_dirty HTTP/1.1" 200 126 0.026514
172.20.0.254 - - [2022-04-20 17:40:11] "GET /robot/reset HTTP/1.1" 200 131 10.327322
172.20.0.254 - - [2022-04-20 17:40:11] "GET /robot/is_collided HTTP/1.1" 200 130 0.028259
172.20.0.254 - - [2022-04-20 17:40:11] "GET /robot/is_finished HTTP/1.1" 200 130 0.007381
172.20.0.254 - - [2022-04-20 17:40:11] "GET /config/task/observations HTTP/1.1" 200 188 0.000910
172.20.0.254 - - [2022-04-20 17:40:11] "GET /connections/image_depth HTTP/1.1" 200 1656308 0.173702
172.20.0.254 - - [2022-04-20 17:40:11] "GET /connections/image_depth_info HTTP/1.1" 200 874 0.008879
172.20.0.254 - - [2022-04-20 17:40:12] "GET /connections/image_rgb HTTP/1.1" 200 1970020 0.185882
172.20.0.254 - - [2022-04-20 17:40:12] "GET /connections/image_rgb_info HTTP/1.1" 200 874 0.008585
172.20.0.254 - - [2022-04-20 17:40:12] "GET /connections/laser HTTP/1.1" 200 10164 0.021387
ERROR: Supervisor failed on processing connection 'poses' with error:
JSONDecodeError('Expecting value', '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">\n<title>500 Internal Server Error</title>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>\n')
172.20.0.254 - - [2022-04-20 17:40:12] "GET /connections/poses HTTP/1.1" 500 426 0.030074

My benchbot agent script fails with following error

Waiting to establish connection to a running supervisor ... Connected!
Waiting to establish connection to a running robot ... Connected!
Dirty robot state detected. Performing reset ... Complete.
Traceback (most recent call last):
  File "/home/user/Projects/benchbot/api/benchbot_api/benchbot.py", line 148, in _query
    raise _UnexpectedResponseError(resp.status_code)
benchbot_api.benchbot._UnexpectedResponseError: Received an unexpected response from BenchBot supervisor (HTTP status code: 500)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "benchbot_ssu_main.py", line 187, in <module>
    bb = BenchBot(agent=agent)
  File "/home/user/Projects/benchbot/api/benchbot_api/benchbot.py", line 84, in __init__
    self.start()
  File "/home/user/Projects/benchbot/api/benchbot_api/benchbot.py", line 398, in start
    self.reset()
  File "/home/user/Projects/benchbot/api/benchbot_api/benchbot.py", line 292, in reset
    return self.step(None)
  File "/home/user/Projects/benchbot/api/benchbot_api/benchbot.py", line 450, in step
    for o in self.observations
  File "/home/user/Projects/benchbot/api/benchbot_api/benchbot.py", line 450, in <dictcomp>
    for o in self.observations
  File "/home/user/Projects/benchbot/api/benchbot_api/benchbot.py", line 153, in _query
    "failed using the route:\n\t%s" % addr)
requests.exceptions.ConnectionError: Communication to BenchBot supervisor failed using the route:
    http://benchbot_supervisor:10000/connections/poses

btalb commented 2 years ago

Thanks for reporting this @gmuraleekrishna .

We've been able to reproduce similar issues locally, and will look at getting a fix out hopefully tomorrow.

Sorry for the inconvenience.

btalb commented 2 years ago

I believe I've found and fixed a cause of this issue @gmuraleekrishna : our tolerances for detecting a "dirty simulator" state were too low for some environments.

To confirm:

You shouldn't get the error you were getting if you use environment house:1
Updating BenchBot to v2.3.2 should remove the error for all environments

Unfortunately this number is somewhat ad-hoc, so please let us know if you have the issue in any other environments.

qcr / benchbot

benchbot_run fails with internal server error #56