osrf / srcsim

Space Robotics Challenge
Other
9 stars 4 forks source link

Gazebo real time factor falls to 0.0 in srcsim 0.6 #206

Open osrf-migration opened 7 years ago

osrf-migration commented 7 years ago

Original report (archived issue) by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


Saw this error twice today. After getting the error, the real time factor is 0.0. In both cases, the robot was fixing either pitch or yaw in task1. Any specific log that I should be looking at to provide more details?

[ERROR] [1496438954.141746404, 279.640000000]: Exception in thread "SynchronousMultiThreadedRobotController-thread-1" java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 29, Size: 0                                       
[ERROR] [1496438954.141839560, 279.640000000]:  at us.ihmc.wholeBodyController.DRCControllerThread.run(DRCControllerThread.java:360)                                                                                                            
[ERROR] [1496438954.141897458, 279.640000000]:  at us.ihmc.wholeBodyController.concurrent.SynchronousMultiThreadedRobotController$ReentrantLockedControlElementRunner.run(SynchronousMultiThreadedRobotController.java:89)                      
[ERROR] [1496438954.141947921, 279.640000000]:  at java.lang.Thread.run(Thread.java:745)
[ERROR] [1496438954.142116091, 279.640000000]: Caused by: java.lang.IndexOutOfBoundsException: Index: 29, Size: 0
[ERROR] [1496438954.142157559, 279.640000000]:  at us.ihmc.robotics.lists.RecyclingArrayList.rangeCheck(RecyclingArrayList.java:340)                                                                                                            
[ERROR] [1496438954.142204445, 279.640000000]:  at us.ihmc.robotics.lists.RecyclingArrayList.get(RecyclingArrayList.java:130)                                                                                                                   
[ERROR] [1496438954.142251275, 279.640000000]:  at us.ihmc.robotics.math.trajectories.waypoints.FrameTrajectoryPointList.getTrajectoryPoint(FrameTrajectoryPointList.java:109)                                                                  
[ERROR] [1496438954.142298453, 279.640000000]:  at us.ihmc.commonWalkingControlModules.controlModules.chest.TaskspaceChestControlState.queueExceedingTrajectoryPointsIfNeeded(TaskspaceChestControlState.java:260)                              
[ERROR] [1496438954.142344671, 279.640000000]:  at us.ihmc.commonWalkingControlModules.controlModules.chest.TaskspaceChestControlState.initializeTrajectoryGenerator(TaskspaceChestControlState.java:230)                                       
[ERROR] [1496438954.142392687, 279.640000000]:  at us.ihmc.commonWalkingControlModules.controlModules.chest.TaskspaceChestControlState.doAction(TaskspaceChestControlState.java:112)                                                            
[ERROR] [1496438954.142446146, 279.640000000]:  at us.ihmc.robotics.stateMachines.conditionBasedStateMachine.GenericStateMachine.doAction(GenericStateMachine.java:152)                                                                         
[ERROR] [1496438954.142495996, 279.640000000]:  at us.ihmc.commonWalkingControlModules.controlModules.chest.ChestOrientationManager.compute(ChestOrientationManager.java:126)                                                                   
[ERROR] [1496438954.142544149, 279.640000000]:  at us.ihmc.commonWalkingControlModules.highLevelHumanoidControl.highLevelStates.WalkingHighLevelHumanoidController.updateManagers(WalkingHighLevelHumanoidController.java:595)
[ERROR] [1496438954.142599356, 279.640000000]:  at us.ihmc.commonWalkingControlModules.highLevelHumanoidControl.highLevelStates.WalkingHighLevelHumanoidController.doAction(WalkingHighLevelHumanoidController.java:518)
[ERROR] [1496438954.142662317, 279.640000000]:  at us.ihmc.robotics.stateMachines.conditionBasedStateMachine.GenericStateMachine.doAction(GenericStateMachine.java:152)
[ERROR] [1496438954.142701400, 279.640000000]:  at us.ihmc.commonWalkingControlModules.highLevelHumanoidControl.HighLevelHumanoidControllerManager.doControl(HighLevelHumanoidControllerManager.java:178)
[ERROR] [1496438954.142738952, 279.640000000]:  at us.ihmc.robotics.robotController.ModularRobotController.doControl(ModularRobotController.java:20)
[ERROR] [1496438954.142775067, 279.640000000]:  at us.ihmc.wholeBodyController.DRCControllerThread.run(DRCControllerThread.java:352)
[ERROR] [1496438954.142815993, 279.640000000]:  ... 2 more
osrf-migration commented 7 years ago

Original comment by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


osrf-migration commented 7 years ago

Original comment by Louise Poubel (Bitbucket: chapulina, GitHub: chapulina).


osrf-migration commented 7 years ago

Original comment by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


Update: This error is not seen when running a single task and we are proceeding with our testings on just unique_task1.world.

on a side note, I feel srcsim works almost fine when running one task at a time, if it cannot be made more reliable why not load task specific worlds in finals? We will lose out pointcloud gathered from different angles during completing the previous task but spending a little extra time in start box is a very small price to pay if loaded worlds are more robust.

osrf-migration commented 7 years ago

Original comment by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


I'm unable to get into gdb after this failure. Is there anything more to be done apart from adding debug:=true? This is causing a failure every single time I finish setting both the handles in task1 in the complete world.

osrf-migration commented 7 years ago

Original comment by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Does this error happen only on cloudsim?

Changing srcsim to run each task individually would change the structure of the competition at a fairly late stage.

Let us know if/when you experience this problem during dry run and we'll perform a reset. After a reset, you can skip to the appropriate checkpoint. We'll keep track of this reset so that you won't loose points.

This process is a bit painful, but changing the competition structure will have a different set of pain points.

osrf-migration commented 7 years ago

Original comment by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


Thanks for the comments. I'm running on local setup here and planning to test it out on cloudsim tonight. Unfortunately, there is no way for me to read that state on OCU to know that the realtime factor is 0.

I'm trying to get debug info so as to understand the error better but looks like gdb is not attached to the process. I see the following lines when I launch srcsim with debug flags

process[gazebo-6]: started with pid [27639]
/opt/ros/indigo/lib/gazebo_ros/debug: 5: [: Linux: unexpected operator
osrf-migration commented 7 years ago

Original comment by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


@nkoenig we are facing a similar issue when we open the door handle in task3. How can that be reset to a point where the door handle is open? This error occurs more than 80% of the time and we can help provide debug logs if required but will need some support in generating those logs as well.

osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


It's unlikely that we will be able to resolve this issue since the java controllers are a black box to us, and time is short. I can only offer the suggestion of trying an alternate approach. For example, if you're grasping the wheel then maybe try an approach that doesn't create as many contact constraints.

osrf-migration commented 7 years ago

Original comment by Vinayak Jagtap (Bitbucket: vinayak_jagtap).


It is too late for us to change those grasps or trajectories now (we'll try our best to modify it, if possible). Given that we are using the APIs as documented and this issue being out of our scope, could we or any other team get a no score/time penalty reset if this occurs?

osrf-migration commented 7 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).


Changing your grasps and trajectories is your call. I'm not trying to convince you to change your code.

Since this problem is out of your control, then we can reset you without penalty. What could impact you is the overall duration of the final event. We have a hard stop on Friday of next week.