Open marc-hanheide opened 8 years ago
OK, so this is the brief summary of what we planned to do. No it's time to get going on this. Can ask representatives of the sites (maybe @bfalacerda @Pandoro @Jailander @PDuckworth and @nilsbore) to check the parameters they are using, post them (i.e. link them) from here and write one paragraph summarising your observations.
Then, we should create the setup for the test cases, e.g. have a shared topological map which we only slightly adapt to our own environments, but have the names consistent. I suggest, UOL (@jailander, myself, @cdondrup) create a first prototype of the test setup.
Then we should have a hangout to discuss the next steps. Here's a doodle I ask you all to fill to arrange this.
Many thanks
forgot @hawesie... sorry
I forgot we can't check what params we were using on Bob during the deployment because his hard drive died. I'm pretty sure we were using the current strands_movebase ones though.
However, @kunzel did some changes to the dwa params to make it follow the global path more closely. It was only added to scitos_2d_nav, because it makes navigation in simulation work much better. The differences are
cost =
path_distance_bias * (distance to path from the endpoint of the trajectory in meters)
+ goal_distance_bias * (distance to local goal from the endpoint of the trajectory in meters)
+ occdist_scale * (maximum obstacle cost along the trajectory in obstacle cost (0-254))
Maybe @cdondrup has some insight on these params, since he has been looking at dwa?
Actually setting path_distance_bias
to 1 and goal_distance_bias
to 0.5 should have had the opposite effect and allow the robot to diverge from the global path a bit more. However, if the DWA only uses the params described above (I remember there was goal align and path align somewhere as well but that might have been in a different implementations of it) then this shouldn't have changed much. The ratio is key here so with 32 and 24 it should have been the same as 1.0 and 0.75... so by changing it to 1.0 and 0.5 we only just gave less weight to the goal_distance_bias
.
The occdist_scale
is the one responsible for obstacle avoidance and the default here should be 0.01 which means it only takes lethal obstacles into account and doesn't have effect on the other two params.
yes, from what i understand, the short sim time is what makes a big difference here: since both those values only look at the endpoint of the simulated trajectories, if we increase their size we increase the number of trajectories that don't go along the global plan, but have an endpoint close to it
Isn't that what you want for obstacle avoidance?
depends. if we increase the rate of the global planner it might be better to keep the sim time smaller? with this version we had the robot rotating a lot more in place, at least in simulation.
We are indeed using some custom parameters. We had our old hiwi run around and optimize the parameters in such a way that the behaviour would be 'optimal' and we have the following in our "site parameter"-file:
local_costmap:
width: 4
height: 4
resolution: 0.02
# Improves navigation in tight corridors tremendously.
inflation_layer:
inflation_radius: 0.5
DWAPlannerROS:
max_vel_y: 0.0
min_vel_y: 0.0
I can't really say what the idea was behind these changes, just that it felt a bit more stable after he played around with the parameters. In general navigation seems okayish, apart from of course the driving towards obstacles and the slightly awkward behaviour in our kitchen that I described during the GA. According to our launch file we are supposedly running the human aware navigation, but he doesn't stare at people so maybe something went going wrong there. I'll check more accurately on our next run with Karl.
Could you please submit a link to the "whole" set of files, as this already assumes there would be a standard you deviate from ;-)
You can check out the strands karl repo. This should have everything we use, including the above file. We took most of the stuff from what we had before in the marathon system and updated it with everything new and relevant from the review system.
@Pandoro @Jailander @cdondrup Can I ask you to fill the doodle poll at https://doodle.com/poll/56f7c5ydtqny8p78 as well, please?
For the record, I link the current versions of params here:
missing
@bfalacerda do you remember why we move to our own NavFn? https://github.com/strands-project/strands_movebase/commit/561836295f0418981240e0b1eb5db40f90d7c0ba
yes, so we could reconfigure the default_tolerance param a runtime. This makes the robot not go to the exact pose of the intermediary waypoints.
right, I remember...
A git diff to the branch we are using reveals no changes to the parameters.
@marc-hanheide in Linda we use the standard parameters namely the ones at https://github.com/strands-project/strands_movebase/tree/indigo-devel/strands_movebase/strands_movebase_params
simulation env in https://github.com/strands-project/strands_morse/pull/138
Then I thought well, I am going to read the source code, which helped a lot. and thanks for good coding + comments. In dwa_local_planner page, the forward_point_distance is defined as an additional scoring point in front of the robot. I frankly didn't understand what it meant. However, in the source code, I found out that this value is being used as a cost value for robot alignment to the path. By setting it to zero, the trajectory will not be penalized if the robot is not aligned on the path. This is a very important factor for differential drive robots whose centre of rotation is not the in the middle of the robot. Apparently, a lot of valid trajectories were being dropped because of this. As soon as I set this to zero, the robot was able to perform much better. However, improvements could be done so I continued.
This makes me want to investigate how the robot behaves if we set forward_point_distance: 0.0
. The point afterwards about visualizing the evaluated trajectories also sounds like a good point.
very useful @nilsbore !
Quick summary of today's hangout:
I have forked navigation into strands-project and added my debug out changes to the "strands-testing" branch. These changes are not permanent as they make the code less efficient, so this branch should only be used for debugging etc. Therefore I don't think a "release" would be a good idea.
To use it, download and build the repo in you workspace then restart navigation, and run the (ugly) python script dwa_planner/scripts/log.py. This will open a window with six images, each showing the score for the critics used in the code. The axis for each graph are: y=angular velocity, x=linear velocity, centre is zero and the values are scaled by 10. so (x=10,y=30) represents (lv=0, wv=0), (x=15,y=20) represents (lv=0.5, wv=-0.1). The graphs are updated in real time as the robot drives, but some timesteps will be missed due to the speed of the redraw. If this is a problem then you can log the topic debug topic and replay it slower...
@cburbridge agreed, not to be released. I just thought we'll eventually fix code in navigation and then we'd have our own. But for now, let's leave it as a debug tool.
People will (probably) like to hear that Jenkins can now run simulation-enabled test (more precisely, it can run test in a virtual OpenGL/X environment, hence allowing us to write unit test that use the simulator and/or other GUIs. https://github.com/strands-project/strands_navigation/pull/270 is a first example (though a stupid one as it only starts the simulator but doesn't do anything with it, but it's a test). One outcome of any tests we now run on jenkins (devel or PR) is now a highly compressed (1 fps) video (see https://lcas.lincoln.ac.uk/jenkins/view/Pull%20Requests/job/pr-indigo-strands_navigation/98/artifact/Xvnc.mp4 for the corresponding example) showing the virtual desktop (jump to minute 6 in said video to see the simulator appearing). This video should help to identify causes of failure in unit tests. As this is facilitated by a virtualised compute server GPU, you cannot expect the performance of a real GPU here, but it's enough to run morse in wireframe (and maybe also small/simple environments).
Having written all this, I should probably write the same to the mailing list...
Anyway, so the infrastructure is there to actually write more elaborate tests... now somebody needs to do it ;-)
I talked to my people and this is the (probably not as satisfying as hoped for) gist of it:
1. The SCITOSDriver specifies a whole lot of maximum/minimum velocities and maximum positive/negative accelerations. These values are passed down to the MCU (Main Control Unit) and will be used as soft limits in the motor controller. Most of the time these constraints are met, but apparently it cannot be guaranteed (only in extreme situations).
This means that the "drive parameters" of all STRANDS robots are currently specified here. More specifically, these parameters govern the way soft limits:
<MaxForwardVelocity>1.0</MaxForwardVelocity>
<MaxBackwardVelocity>-0.4</MaxBackwardVelocity>
<MaxRotVelocity>120</MaxRotVelocity>
<TransEpsilonVelocity>0.02</TransEpsilonVelocity>
<RotEpsilonVelocity>1</RotEpsilonVelocity>
<MaxTransAcceleration>0.8</MaxTransAcceleration>
<MaxTransDeceleration>0.9</MaxTransDeceleration>
<MaxRotAcceleration>180</MaxRotAcceleration>
<MaxRotDeceleration>260</MaxRotDeceleration>
<MaxEmergencyDeceleration>1.6</MaxEmergencyDeceleration>
Rotations given in degrees / {second,second^2} and translations given in meters / {second, second^2}.
The changes @bfalacerda made with respect to the DWA parameters should reflect those parameters. The parameters can also be tweaked to achieve softer acceleration, more abrupt braking, etc.
2. (paraphrasing Stefan, as I don't know a lot about electronics): The force basically imposes a threshold for the motor current, which is encoded as a 8 bit PWM (hence the values 0-255 cover the complete analogue spectrum). From a motor controlling perspective it would be ideal to always drive at full force, but due to wheel slippage and things like relying on the robot to get stopped by the charging station's threshold, we compromise and use a value of typically around 80-120. Therefore lowering the force too much will increase the stopping distance. Also, with a force of around 100 the acceleration/deceleration limits set (see above) aren't necessarily reached, so increasing the force will enable the robot to get closer to the set limits if they are "quite high".
There is no real formula or table I can give you regarding the force and the accelerations. But setting lower values for accelerations should mitigate the influence of changing the force on the acceleration/deceleration behaviour.
given the values are defined in the Mira config file would it make sense to expose them as rosparams and have DWA configured from those params? obviously we would expose them as rad/s(^2) where needed?
Certainly possible, they can be queried in MIRA and therefore exposed as rosparams. Though note that they are not dynamically reconfigurable, they are written to the MCU at startup and then can't be changed during runtime. But still, it would probably beat having to maintain two sets of files (robot config file and dwa params file) and all the potential trouble that comes with that. I can get together with @cburbridge to help expose the properties.
I have done some test in the scitos_mira
code, and I think there is a ~400ms delay between commanding a velocity and the odometry reporting that the robot is moving at that speed. This is just the delay on the MIRA side, it does not include any delay between the DWA selecting a velocity and the MIRA code sending that velocity to the firmware. @creuther would you expect this kind of timing?
I made some improvements to how the clouds are processed by movebase, which should gains us some more CPU on the main computer: https://github.com/strands-project/strands_movebase/pull/64 .
In response to https://github.com/strands-project/strands_movebase/issues/62#issuecomment-152651469: @creuther can you comment? A delay of almost half a second can indeed explain a lot of stuttering I guess. Is there a way to reduce this or is there a way to make DWA aware of such a delay (asking very silly questions)...
This seems very peculiar to me. I wanted to do a test on Betty but no one's here yet so I'm hesitant to stop the running ROS stuff.
From our side, we expect some delay in setting and passing the command to the motors ( as always, giving an exact number is not possible, unfortunately... :( ) but certainly not 400ms. I'll have a look and report back as soon as someone turns up at the office ;)
Tsk, lazy Brummies.
I'm away this week, but will look at it again on Monday.
Small update: Commanding a velocity from MIRA directly doesn't result in any significant delay, while the delay is definitely there if you command the velocity from ROS. I had a look through the scitos_mira code but can't really see any obvious points of concern.
The only thing that I am not too sure about is the following: Here we define that ScitosG5, i.e. pretty much the entire ROS-MIRA-bridge, should spin at 5Hz. Does the following sleep mean that the entire ROS-MIRA-bridge will be blocked for the remaining (200 - x) ms? I tried to set it to spin at 25Hz as well as 1Hz on Betty but didn't really make a difference. Then again @cburbridge 's debug timestamps show that the odometry is definitely received with 20Hz, so it really doesn't make a difference / block it as far as I can tell. Strange.
The 5Hz spin loop is a separate thread to the odometry message receiving, so having this at 5Hz should be ok I think. However, there used to be horrible bug in this respect a while back: https://github.com/strands-project/scitos_drivers/pull/53 so this needs more checking. Since the only component that registers functions to be called in that loop is the head, it would be interesting to test without that module. If you have time could you try running the scitos launch again, but remove Head from https://github.com/strands-project/scitos_drivers/blob/indigo-devel/scitos_mira/launch/scitos_mira.launch#L4. If not no problem, I'll test it out on Monday.
Since the tests are almost finished, I have a few questions:
Recovery Behaviours: Which recovery behaviours, if any, do we want during the test runs? Especially things like backtrack make the whole test if DWA can turn when it faces a wall kind of pointless as the robot will backtrack and then turn. Since we want to tune DWA parameters, I think we have to be careful which recoveries to run. This brings me to another question: Why are there two different locations where the recovery behaviours are defined @bfalacerda ? I.e. in the config file and a python file.
Results: What feedback would like to get in addition to that it failed or succeeded. Currently, the service starting the test run will report if navigation to the goal was successful, if not, if the robot was able do die gracefully by at least being able to return to the starting location, the travelled distance and time. Once I took care of the human, I will also add the minimum distance kept to the human.
Timeouts: Currently, the navigation times out after 40 seconds and then tries to go back to the starting location for another 40 seconds. Meaning that if the goal is not reached in that time it will take the same time to try and go back. In total that means each test has a maximum of 80 seconds to at least die gracefully. Do we want a timeout at all or do we want to rely on navigation failing on its own? This might mean that the tests will run for quite a while or we have to reconfigure the sleep and retry argument (if we want that recovery behaviour at all) to make it fail quicker.
Maybe @marc-hanheide and/or @hawesie have an opinion on those.
Just briefly talked to @marc-hanheide and this is what we thought would be best:
Recovery Behaviours: None. This is not what we want to test here. Follow-up question for @bfalacerda by not giving the mon nav a config file it does not run the stuff in the config file obviously. For the ones in the python file I would have to disable them via the parameter?! Or can I disable them all together from the start. Maybe not even running monitored navigation for now?
Timeouts: No time outs. Navigation should fail on its own.
Any other thoughts on that?
The config file tells which recovery behaviours python file to load, you need to run monitored navigation, but if you don't give any config file it will simply not load any (which is what we want in this case).
I agree with @marc-hanheide, and @Jailander already answered the question about the config file :) You should also see a warning being printed telling you that there are no recovery behaviours added.
Regarding the config vs python, config defines the RecoverStateMachine to load. The SM itself is defined as a python class, since it's using smach. We need the python code to define how the SM moves between states.
We discovered a different problem regarding the timeout that I guess @bfalacerda solved somewhere in monitored nav that I can't find. Navigation never fails. Somewhere there has to be a timeout. Can you point me to that @bfalacerda ?
I don't get the question. What do you mean navigation never fails?
There are no timeouts in monitored nav, it waits for move_base (or whatever action it calls) to give a result
I think the question, is how to set a timeout in monitored navigation because move_base doesn't fail by itself
it does, dwa eventually gives up. you can also set the planner_patience
and controller_patience
params:
http://wiki.ros.org/move_base?distro=jade#Parameters
The problem is that if the robot starts in front of a wall (10cm distance) it just prints:
[ WARN] [1446723231.414625106]: DWA planner failed to produce path.
[ INFO] [1446723231.513456115]: Got new plan
until the end of times. The problem is that the robot is not inside an obstacle so navigation does not fail but is also not able to find a path. I let it run for a few minutes but it didn't give up. I'll have a look the link. Thanks!
OK, I don't think this will help much:
~planner_patience (double, default: 5.0)
How long the planner will wait in seconds in an attempt to find a valid plan before space-clearing operations are performed.
~controller_patience (double, default: 15.0)
How long the controller will wait in seconds without receiving a valid control before space-clearing operations are performed.
It does perform the clearing but then simply keeps on trying to no avail.
are you sure it's not failing and being called over and over again? Are you using topo nav policy execution? I think it tries several times. monitored nav will always report back to topo after a failure so it can decide what to do
the behaviour that move_base usually has after dwa failing for controller_patience
seconds is clearing the costmaps and trying again. If valid controls are still not found, it should output aborted. I've never seen it getting stuck with dwa failing forever
Just to see if it does anything I ran it for 5 minutes (8 minutes now after finishing to write this) and the only thing it outputs is that DWA is unable to produce a path. This is the same when the robot, e.g. is surrounded by people, technically, there is nothing wrong because the robot is in free space and not inside an obstacle but the DWA can still not find a path. Hence, navigation will never fail and mon nav will never know.
It did clear the costmaps but then keeps on running till forever. I'm using the default move_base params:
cdondrup@calculon:~/ros_ws
$ rosparam get /move_base/
DWAPlannerROS: {acc_lim_th: 2.0, acc_lim_theta: 3.2, acc_lim_x: 1.0, acc_lim_y: 0.0,
acc_limit_trans: 0.1, angular_sim_granularity: 0.1, forward_point_distance: 0.325,
goal_distance_bias: 24.0, holonomic_robot: false, latch_xy_goal_tolerance: true,
max_rot_vel: 1.0, max_scaling_factor: 0.2, max_trans_vel: 0.55, max_vel_x: 0.55,
max_vel_y: 0.0, min_rot_vel: 0.4, min_trans_vel: 0.1, min_vel_x: 0.0, min_vel_y: 0.0,
occdist_scale: 0.01, oscillation_reset_angle: 0.2, oscillation_reset_dist: 0.05,
path_distance_bias: 32.0, prune_plan: true, restore_defaults: false, rot_stopped_vel: 0.1,
scaling_speed: 0.25, sim_granularity: 0.025, sim_time: 2.1, stop_time_buffer: 0.2,
trans_stopped_vel: 0.1, use_dwa: true, vth_samples: 20, vx_samples: 3, vy_samples: 10,
xy_goal_tolerance: 0.3, yaw_goal_tolerance: 0.1}
NavfnROS: {allow_unknown: false}
aggressive_reset:
layer_names: [obstacle_layer]
reset_distance: 0.0
base_global_planner: navfn/NavfnROS
base_local_planner: dwa_local_planner/DWAPlannerROS
clearing_rotation_allowed: true
conservative_reset:
layer_names: [obstacle_layer]
reset_distance: 2.0
conservative_reset_dist: 3.0
controller_frequency: 10.0
controller_patience: 15.0
global_costmap:
footprint: '[[-0.3,-0.32],[-0.52,-0.11],[-0.52,0.11],[-0.3,0.32],[-0.05,0.31],[0.05,0.25],[0.15,0.13],[0.2,0],[0.15,-0.13],[0.05,-0.25],[-0.05,-0.31]]'
footprint_padding: 0.0001
global_frame: /map
height: 10
inflation_layer: {cost_scaling_factor: 5.0, enabled: true, inflation_radius: 0.8}
map_layer: {enabled: true, map_topic: /map}
no_go_layer: {enabled: true, map_topic: /no_go_map}
obstacle_layer:
combination_method: 1
enabled: true
laser: {clearing: true, data_type: LaserScan, marking: true, sensor_frame: base_laser_link,
topic: scan}
mark_threshold: 0
max_obstacle_height: 2.0
observation_sources: laser
obstacle_range: 2.5
origin_z: 0.0
raytrace_range: 3.0
track_unknown_space: true
unknown_threshold: 6
z_resolution: 0.2
z_voxels: 6
obstacle_layer_footprint: {enabled: true}
origin_x: 0.0
origin_y: 0.0
plugins:
- {name: map_layer, type: 'costmap_2d::StaticLayer'}
- {name: no_go_layer, type: 'costmap_2d::StaticLayer'}
- {name: obstacle_layer, type: 'costmap_2d::VoxelLayer'}
- {name: inflation_layer, type: 'costmap_2d::InflationLayer'}
publish_frequency: 0.5
resolution: 0.05
robot_base_frame: /base_link
robot_radius: 0.46
rolling_window: false
static_map: true
transform_tolerance: 0.3
update_frequency: 5.0
width: 10
local_costmap:
footprint: '[[-0.3,-0.32],[-0.52,-0.11],[-0.52,0.11],[-0.3,0.32],[-0.05,0.31],[0.05,0.25],[0.15,0.13],[0.2,0],[0.15,-0.13],[0.05,-0.25],[-0.05,-0.31]]'
footprint_padding: 0.0001
global_frame: /map
height: 4
inflation_layer: {cost_scaling_factor: 5.0, inflation_radius: 0.8}
map_layer: {map_topic: /map}
no_go_layer: {map_topic: /no_go_map}
obstacle_layer:
combination_method: 1
enabled: true
laser: {clearing: true, data_type: LaserScan, marking: true, sensor_frame: base_laser_link,
topic: scan}
mark_threshold: 0
max_obstacle_height: 2.0
observation_sources: laser
obstacle_range: 2.5
origin_z: 0.0
raytrace_range: 3.0
track_unknown_space: true
unknown_threshold: 6
z_resolution: 0.2
z_voxels: 6
obstacle_layer_footprint: {enabled: true}
origin_x: 0.0
origin_y: 0.0
plugins:
- {name: obstacle_layer, type: 'costmap_2d::VoxelLayer'}
publish_frequency: 2.0
resolution: 0.05
robot_base_frame: /base_link
robot_radius: 0.46
rolling_window: true
static_map: false
transform_tolerance: 0.3
update_frequency: 5.0
width: 4
oscillation_distance: 0.5
oscillation_timeout: 0.0
planner_frequency: 0.5
planner_patience: 5.0
recovery_behavior_enabled: true
recovery_behaviors:
- {name: conservative_reset, type: clear_costmap_recovery/ClearCostmapRecovery}
- {name: aggressive_reset, type: clear_costmap_recovery/ClearCostmapRecovery}
restore_defaults: false
shutdown_costmaps: false
Yes, I am using policy execution from topo nav.
The easiest fix would be to set a timeout for the policy execution client but this feels somehow artificial.
Also, the goal is in free space. If the goal would be blocked, this would fail.
I just tried calling move base directly via rviz and i get this:
[ WARN] [1446725936.535797813]: DWA planner failed to produce path.
[ INFO] [1446725936.633590127]: Got new plan
[ WARN] [1446725936.636505293]: DWA planner failed to produce path.
[ INFO] [1446725936.733484535]: Got new plan
[ WARN] [1446725936.735877074]: DWA planner failed to produce path.
[ WARN] [1446725936.833553834]: Clearing costmap to unstuck robot (2.000000m).
[ INFO] [1446725937.033437890]: Got new plan
[ WARN] [1446725937.035099355]: DWA planner failed to produce path.
[ WARN] [1446725937.133271232]: Clearing costmap to unstuck robot (0.000000m).
[ INFO] [1446725937.333269376]: Got new plan
[ WARN] [1446725937.336529415]: DWA planner failed to produce path.
[ERROR] [1446725937.433302700]: Aborting because a valid control could not be found. Even after executing all recovery behaviors
And then it stops. Can you try an rviz goal also? Do you see this error message in your case? If so, it aborted, and then was called again
OK, I tested a bit more. This seems to only happen in one of the test cases where the robot after clearing the costmap is bumping back a and forth by ~1cm all the time. Therefore it moves and DWA does not fail. In every other case, I also observe the behaviour you described. It's test map mb_test1 if you want to try.
team members @cburbridge @bfalacerda @cdondrup @Pandoro @Jailander @denisehe @TobKoer @PDuckworth
Identified issues
Test cases
static tests
dynamic tests
Thinks to do
To prevent regressions:
Further ideas: