Closed njhetherington closed 5 years ago
Good question. Currently we are not using this topic, so I don't have any code to provide. If you are operating in an open area, you can likely ignore the safe_actions topic/concept.
The idea was to have one set of possible actions (speed-heading pairs), which are first checked by an external static obstacle collision checking script (e.g. using ROS costmap_2d), and then the "safe" ones are sent to be evaluated by the policy with respect to dynamic obstacles. This is a rough way to handle static obstacles (walls) using a policy only trained on dynamic+round obstacles.
In this implementation, the learned policy is sorta a delta function (~99% on the best action, <1% on other actions), so if the best action according to the policy is deemed "unsafe" by the static obstacle checker, it's not clear that the 2nd best action according to the policy has much meaning. Of course, if the static obstacle checker says to stop, you should stop regardless of what the policy says about that action.
An alternative approach might be to look at the values of the states after taking an action (e.g. V(s + a*dt)), or to learn a policy that's not as extreme (e.g. by adding more entropy weight).
Thanks for getting back to me.
We do have static obstacles in our environment, so I'm trying to write a collision checking script as you've described.
I'm new to reinforcement learning - could you please explain how I would check the state values for a given action ?
Take a look at the python notebook in this repo to see how to query the policy (probability of selecting each action) given a state vector (observation of other agents).
If you add the following line to the network.py
script (at line 47), it should also compute the value function:
# Cost: v
self.logits_v = tf.squeeze(tf.layers.dense(inputs=self.fc1, units = 1, use_bias = True, activation=None, name = 'logits_v'), axis=[1])
The network.py
script is where the conversion from state to policy is defined. This additional line helps define the value for the current state, so in order to compare values of the various actions under consideration, you'll have to do something to propagate forward the current state based on each action. In RL, there is an alternative form of value function that takes in a state and action, but we didn't use that approach here (e.g. Q-learning).
I haven't checked the quality of the learned value function, so I'm honestly not sure how good it'll be. With the particular learning algorithm we used in this paper, the value function was learned sorta as an auxiliary measurement to help train what we really wanted: the policy (see actor-critic methods).
Hi Michael,
I also have a question concerning safe_actions publisher. I am currently trying to get cadrl_node running for testing purposes and for now, I matched all the input/output topics with my own setup.
The problem, as it seems, is with empty feasible_actions, which, if I understood the code well, should be set by the very publisher on ~safe_actions
topic using cbNNActions callback:
def cbNNActions(self, msg):
self.feasible_actions = msg
Since it is empty, calling cbComputeActionGA3C
fails with Invalid Feasible Actions
message.
I presume the publisher that uses NN should be written, but currently I am not sure how would I do that.
Any help is appreciated.
Thanks, Karlo
Hi Karlo - I see what's going on. You should be able to run the ROS node completely without feasible_actions
, but I forgot to fully remove it. Can you checkout the no_feasible_actions
branch I just added, and let me know if that removes the error?
Hey Michael,
Thanks for being prompt! Switching to no_feasible_actions
branch did fix the Invalid Feasible Actions
error.
However, another thing emerged. When the new goal is given, via rviz for instance, cadrl_node
outputs the following:
Not in NN mode
2
I managed to overcome the error by changing cbGlobalGoal
method at line 112
from:
self.operation_mode.mode = self.operation_mode.SPIN_IN_PLACE
to
self.operation_mode.mode = self.operation_mode.NN
Although no further error occurred, nn_cmd_vel
publishes only 0 velocity values
---
linear:
x: 0.0
y: 0.0
z: 0.0
angular:
x: 0.0
y: 0.0
z: 0.0
---
and no markers (apart from simulated agents) are published either.
Am I doing something wrong?
Thanks, Karlo
Hmm, could we try to localize whether the issue is with the ROS code or the network query? On line 496 the line predictions = self.nn.predict_p(obs, None)[0]
is the actual network query. Could you see what obs
and predictions
are? And the next few lines should turn the predictions
(policy pdf over actions) into a more readable (speed, heading).
The SPIN_IN_PLACE
code was supposed to make it so upon receiving a new goal, the robot stops and spins until it's roughly pointing toward the goal, then switches into NN
mode and uses the network queries. This was useful when testing the robot going back and forth across a room, but may not be needed in your scenario.
Here is the output:
obs: [[0. 5.59617619 1.53353307 1.2 0.5 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. ]]
predictions: [9.1763228e-01 5.9054375e-02 1.0029992e-04 9.9890160e-05 9.9890131e-05
1.4806822e-02 1.1613100e-03 1.0508312e-04 6.1284434e-03 5.9589912e-04
2.1576905e-04]
best action index: 0
raw_action: [ 1. -0.52359878]
action: [ 1.2 -0.52359878]
chosen action (rel angle) 1.2 -0.5235987755982988
It looks like the if self.goal.header.stamp == rospy.Time(0)
condition in cbControl
method was causing zero twist. From whatever reason the goal header stamp was always zero.
After removing that condition, I got it to work pretty decently. The only thing left to do is adjust the nn_cmd_vel
output for my drive type (which is causing the robot to wiggle as can be seen below).
Using the Jackal simulator for testing seems to give the expected results (no wiggle):
Cool thanks for checking that. your observation is coming in, network is selecting 1.2m/s speed and -0.52 radian heading change, so it must be an issue with the method that actually publishes the action to the ROS topic (update_action, I think) On Sat, Aug 25, 2018 at 3:36 AM rLoka notifications@github.com wrote:
Here is the output:
obs: [[0. 5.59617619 1.53353307 1.2 0.5 0.
- 0.
- 0.
- 0.
- 0.
- 0.
- 0.
- 0.
- 0.
- 0.
- 0.
- 0.
- ]]
predictions: [9.1763228e-01 5.9054375e-02 1.0029992e-04 9.9890160e-05 9.9890131e-05 1.4806822e-02 1.1613100e-03 1.0508312e-04 6.1284434e-03 5.9589912e-04 2.1576905e-04]
best action index: 0
raw_action: [ 1. -0.52359878]
action: [ 1.2 -0.52359878]
chosen action (rel angle) 1.2 -0.5235987755982988
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/mfe7/cadrl_ros/issues/3#issuecomment-415948849, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWgpY-IGiUYyW96ipy52F_OEIfNdsVdks5uUP0igaJpZM4U25px .
I updated my previous post with what I have done to fix it. Thanks for helping. It seems to be working now without problems.
Very cool @rLoka !! Thanks for sharing those videos. In the second clip, it still looks a bit jumpy (stop & go) which seems sorta weird.
I suspect this trained policy won't excel in those static obstacle fields, since it was mainly trained on dynamic agents (with a small percentage of static/non-cooperative agents around as well).
Yeah, it is still not perfect but at least I have it set and running. Will investigate further if the actual drive is causing these stop and go motions or is it up to the trained policy.
@mfe7 @rLoka Hi- I still don't understand how to solve this problem. It‘s:
The problem, as it seems, is with empty feasible_actions, which, if I understood the code well, should be set by the very publisher on ~safe_actions topic using cbNNActions callback:
def cbNNActions(self, msg):
self.feasible_actions = msg
Since it is empty, calling cbComputeActionGA3C fails with Invalid Feasible Actions message.
Could you tell me how to solve it in details? I will appreciate you very much if you could give me some guidance.
Thanks, Zhangdi
@DDBarBar did you try switching branches as noted above? I tried to get rid of everything related to feasible_actions in that branch.
Hello Michael,
Would you be willing to share the code for the publisher of the "safe_actions" topic ?
Thanks in advance.
Nick Hetherington