ERROR: An exception occurred while executing "benchbot_run"

ffgao commented 2 years ago

Hi,When I installed the program and executed "benchbot_run" command. The following error occurred.

~$ benchbot_run --robot carter_omni --env miniroom:1 --task semantic_slam:passive:ground_truth

################################################################################
################# CHECKING FOR BENCHBOT SOFTWARE STACK UPDATES #################
################################################################################

Checking BenchBot version ...               Up-to-date.
Checking BenchBot API version ...           Up-to-date.
Checking BenchBot Eval version ...          Up-to-date.
Checking BenchBot Supervisor version ...        Up-to-date.
Checking installed BenchBot add-ons are up-to-date ...  Up-to-date.

BenchBot is up-to-date.

################################################################################
###################### CLEANING UP ALL BENCHBOT REMNANTS #######################
################################################################################

Sending stop request to running controller:
{"stop_success":true}

Deleted the following containers:
Total reclaimed space: 0B

Finished cleaning! (use 'benchbot_run -k' for a full clean)

################################################################################
##################### STARTING THE BENCHBOT SOFTWARE STACK #####################
################################################################################

Running the BenchBot system with the following settings:

    Selected task:        semantic_slam:passive:ground_truth
    Task results format:  object_map
    Selected robot:       carter_omni
    Selected environment: miniroom:1
    Scene/s:              miniroom:1, starting @ pose [0.7, 0, 0, -0.7, 1.2, 1.5, 0.3]
                          (map_path = '.sim_data/miniroom_1.usd')
    Simulator required:   Yes (sim_omni)

Creating shared network 'benchbot_network':

Starting persistent container for ROS core:
Skipping (already running)

Starting persistent container for BenchBot Robot Controller (sim_omni):
Skipping (already running)

Starting container for BenchBot Supervisor:
7f1589564ca18af1020a65f1e8e92058aeb3f1d8713aa86eddefaa117818ded4

Starting container for BenchBot Debugging:
d1e6f2eb17d5aa82d63936cf39747199fedc66936cd17844f777473c0eafb263

################################################################################
################### BENCHBOT IS RUNNING (Ctrl^C to exit) ... ###################
################################################################################

Initialising supervisor...

Configuring the supervisor...
Starting a supervisor with the following configuration:

{'environments': [{'_file_path': '/benchbot/addons/benchbot_addons/benchbot-addons/envs_bear_develop_sim_omni/environments/miniroom_1.yaml',
                   'description': 'Mini room environment with all of the base '
                                  'objects in their normal place.\n'
                                  'All of the other mini room environments are '
                                  'based off this with any combination\n'
                                  'of objects, object positions, & lighting '
                                  'changed\n',
                   'map_path': '.sim_data/miniroom_1.usd',
                   'name': 'miniroom',
                   'object_labels': [...],
                   'robots': [...],
                   'start_pose': [...],
                   'trajectory_poses': [...],
                   'type': 'sim_omni',
                   'variant': 1}],
 'results': {'_file_path': '/benchbot/addons/benchbot_addons/benchbot-addons/formats_object_map/formats/object_map.yaml',
             'description': 'The "object map" is a map of the objects in an '
                            'environment. Each object is\n'
                            'represented by a probability distribution '
                            'describing the suggested object\n'
                            "label, and the bounding box's 3D centroid and "
                            'extent. An "object map" also\n'
                            'requires a "class_list", which is used to create '
                            'object label probability\n'
                            'distributions.\n',
             'functions': {'create': 'object_map.create_empty',
                           'create_object': 'object_map.create_empty_object',
                           'validate': 'object_map.validate'},
             'name': 'object_map'},
 'robot': {'_file_path': '/benchbot/addons/benchbot_addons/benchbot-addons/robots_sim_omni/robots/carter_omni.yaml',
           'address': 'http://benchbot_robot:10000',
           'connections': {'image_depth': {...},
                           'image_depth_info': {...},
                           'image_rgb': {...},
                           'image_rgb_info': {...},
                           'laser': {...},
                           'move_angle': {...},
                           'move_distance': {...},
                           'move_next': {...},
                           'poses': {...}},
           'global_frame': 'map',
           'name': 'carter_omni',
           'persistent_cmds': ['/benchbot/benchbot_simulator/run -P 10001 & '
                               'x=$! &&  /isaac-sim/start_nucleus.sh ; sleep 5 '
                               '&& curl -X POST http://localhost:10001/start '
                               '&& wait $x\n',
                               'rosrun benchbot_robot_controller noisify_odom '
                               '\\\n'
                               '  noise_linear:=0.2 noise_angular:=0.1\n'],
           'persistent_status': 'curl -s localhost:10001/started | grep -q '
                                "'true'\n",
           'poses': ['base_link', 'initial_pose', 'camera_left', 'lidar'],
           'robot_frame': 'base_link',
           'run_cmd': 'rostopic pub -1 /odom_start_pose std_msgs/String "data: '
                      '\'$START_POSE\'" && curl -s -o /dev/null '
                      'localhost:10001/open_environment \\\n'
                      '  -H "Content-Type: application/json" \\\n'
                      '  -d \'{"environment": "$ENVS_PATH/$MAP_PATH"}\' &&\n'
                      'curl -s -o /dev/null localhost:10001/place_robot \\\n'
                      '  -H "Content-Type: application/json" \\\n'
                      '  -d \'{"robot": "$ROBOT_PATH/.robot_data/carter.usd", '
                      '"start_pose": "$START_POSE"}\'\n',
           'stop_cmd': 'curl -s -o /dev/null -X POST '
                       'localhost:10001/stop_sim\n',
           'type': 'sim_omni'},
 'task': {'_file_path': '/benchbot/addons/benchbot_addons/benchbot-addons/tasks_ssu/tasks/sslam_pgt.yaml',
          'actions': ['move_next'],
          'description': 'Use a Semantic SLAM algorithm to construct an object '
                         'map of the environment. An object map describes each '
                         'object with a probabilistic label suggestion, '
                         'spatial location, and optional probabilistic state '
                         'change suggestion. This task provides passive robot '
                         'control, and sensor observations with ground truth '
                         'robot pose.\n',
          'localisation': 'ground_truth',
          'name': 'semantic_slam:passive:ground_truth',
          'observations': ['image_depth',
                           'image_depth_info',
                           'image_rgb',
                           'image_rgb_info',
                           'laser',
                           'poses'],
          'results_format': 'object_map',
          'scene_count': 1,
          'type': 'sim_unreal'}}

Supervisor is now available @ 'http://0.0.0.0:10000' ...

Waiting until a robot controller is found @ 'http://benchbot_robot:10000' ... 
    Found
Sending environment data & robot config to controller ... 
    Ready
Starting the robot controller ... 
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/requests/models.py", line 910, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/benchbot_supervisor/__main__.py", line 28, in <module>
    s.run()
  File "/usr/local/lib/python3.6/dist-packages/benchbot_supervisor/benchbot_supervisor.py", line 238, in run
    self._robot('/prepare')
  File "/usr/local/lib/python3.6/dist-packages/benchbot_supervisor/benchbot_supervisor.py", line 71, in _robot
    'json': data
  File "/usr/local/lib/python3.6/dist-packages/requests/models.py", line 917, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: [Errno Expecting value] <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
: 0

################################################################################
###################### CLEANING UP ALL BENCHBOT REMNANTS #######################
################################################################################

Sending stop request to running controller:
{"stop_success":true}

Stopped the following containers:
d1e6f2eb17d5

Deleted the following containers:
Deleted Containers:
d1e6f2eb17d5aa82d63936cf39747199fedc66936cd17844f777473c0eafb263
7f1589564ca18af1020a65f1e8e92058aeb3f1d8713aa86eddefaa117818ded4

Total reclaimed space: 189.2kB

Finished cleaning! (use 'benchbot_run -k' for a full clean)

btalb commented 2 years ago

Thanks for reporting @ffgao .

Have you got more context of what was going on when this happened:

had the simulator window popped up?
was this the first run since you installed? Or had previous runs worked on your system?
does this error occur every time you do benchbot_run?
anything interesting about your system setup? Are you running remotely? SSH'ed in? Or sitting directly at the machine?

Struggling to reproduce this locally, so any extra information you may have is helpful.

ffgao commented 2 years ago

Thank you for your help @btalb More information is below:

no simulator window.
first install and do this every time.
Running locally.
I installed isaac sim 2022 in my computer.

Below is a running record: Peek 2022-08-12 11-18

Error occurred a few minutes later:

btalb commented 2 years ago

Hmmm.... sorry about the delay @ffgao , I'm struggling to immediately see what could be causing this (and can't reproduce locally).

I might need a bit deeper in terms of logging:

Can you uncomment this line in your install so benchbot_run doesn't exit immediately after the error: https://github.com/qcr/benchbot/blob/12daa9335e391b10170d372cdfad24df12d0e320/bin/benchbot_run#L398
then send me the entire output of docker logs benchbot_robot_controller

You may have to do a benchbot_run -k to clean up after this.

ffgao commented 2 years ago

No benchbot_robot_controller in the list of container.@btalb The following is my operation.

ffgao:~$ docker ps
CONTAINER ID   IMAGE                         COMMAND                  CREATED          STATUS          PORTS                                                                                                                                                                                              NAMES
372aaf8feeb4   benchbot/backend:base         "/bin/bash"              18 minutes ago   Up 18 minutes                                                                                                                                                                                                      benchbot_debug
c555e44fb73c   benchbot/simulator:sim_omni   "/bin/bash -c 'sourc…"   19 minutes ago   Up 19 minutes   3007/tcp, 3009/tcp, 3080/tcp, 3085/tcp, 3333/tcp, 8011-8012/tcp, 8080/tcp, 8211/tcp, 8888/tcp, 8891/tcp, 8899/tcp, 47995-48012/tcp, 47995-48012/udp, 49000-49007/tcp, 49100/tcp, 49000-49007/udp   benchbot_robot
40f0281b23a5   benchbot/backend:base         "/bin/bash -c 'sourc…"   19 minutes ago   Up 19 minutes                                                                                                                                                                                                      benchbot_ros
ffgao:~$ docker logs benchbot_ro
benchbot_robot  benchbot_ros    
ffgao:~$ docker logs benchbot_robot 

Robot controller is now available @ 'http://0.0.0.0:10000' ...
172.20.0.102 - - [2022-08-17 09:06:33] "GET // HTTP/1.1" 200 152 0.000461
172.20.0.102 - - [2022-08-17 09:06:34] "POST //configure HTTP/1.1" 200 137 0.170677
Preparing the requested controller ... 
FAILED TO PREPARE INSTANCE AFTER 120s. STATUS CHECK COMMANDS WAS:
    curl -s localhost:10001/started | grep -q 'true'

DUMPING LOGS FOR ALL COMMANDS... 
COMMAND:
    /benchbot/benchbot_simulator/run -P 10001 & x=$! &&  /isaac-sim/start_nucleus.sh ; sleep 5 && curl -X POST http://localhost:10001/start && wait $x

OUTPUT:
Traceback (most recent call last):
  File "<string>", line 585, in __prepare
  File "<string>", line 436, in prepare
  File "<string>", line 207, in prepare
IOError: [Errno 20] Not a directory: '/tmp/benchbot_logs/0/r'

172.20.0.102 - - [2022-08-17 09:08:34] "GET //prepare HTTP/1.1" 500 426 120.086528

david2611 commented 1 year ago

Apologies that we have been so slow in fixing problems.

I have also come across this when trying to verify a fix for #82.

If you perform a benchbot_install -b develop operation and run again you should get more meaningful output from your benchbot_robot log. Would you be able to send through what you receive so I can confirm it is the same as what I have been getting?

david2611 commented 1 year ago

While current solution is not pretty I have put a hotfix in place that seems to solve this issue for me. A fresh benchbot_install on the main branch should sort this out.

Worse case scenario I have found with the hotfix is that the first benchbot_run seems to fail but all subsequent ones succeed.

I will work to fix this bug but will open a new issue regarding this.

Can you confirm that this fixes the issues you originally encountered?

david2611 commented 1 year ago

Local testing seemed to show this issue as fixed. Closing due to inactivity

qcr / benchbot

ERROR: An exception occurred while executing "benchbot_run" #81