safe_actions publisher - Githubissues

njhetherington commented 6 years ago

Hello Michael,

Would you be willing to share the code for the publisher of the "safe_actions" topic ?

Thanks in advance.

Nick Hetherington

mfe7 commented 6 years ago

Good question. Currently we are not using this topic, so I don't have any code to provide. If you are operating in an open area, you can likely ignore the safe_actions topic/concept.

The idea was to have one set of possible actions (speed-heading pairs), which are first checked by an external static obstacle collision checking script (e.g. using ROS costmap_2d), and then the "safe" ones are sent to be evaluated by the policy with respect to dynamic obstacles. This is a rough way to handle static obstacles (walls) using a policy only trained on dynamic+round obstacles.

In this implementation, the learned policy is sorta a delta function (~99% on the best action, <1% on other actions), so if the best action according to the policy is deemed "unsafe" by the static obstacle checker, it's not clear that the 2nd best action according to the policy has much meaning. Of course, if the static obstacle checker says to stop, you should stop regardless of what the policy says about that action.

An alternative approach might be to look at the values of the states after taking an action (e.g. V(s + a*dt)), or to learn a policy that's not as extreme (e.g. by adding more entropy weight).

njhetherington commented 6 years ago

Thanks for getting back to me.

We do have static obstacles in our environment, so I'm trying to write a collision checking script as you've described.

I'm new to reinforcement learning - could you please explain how I would check the state values for a given action ?

mfe7 commented 6 years ago

Take a look at the python notebook in this repo to see how to query the policy (probability of selecting each action) given a state vector (observation of other agents).

If you add the following line to the network.py script (at line 47), it should also compute the value function:

# Cost: v 
self.logits_v = tf.squeeze(tf.layers.dense(inputs=self.fc1, units = 1, use_bias = True, activation=None, name = 'logits_v'), axis=[1])

The network.py script is where the conversion from state to policy is defined. This additional line helps define the value for the current state, so in order to compare values of the various actions under consideration, you'll have to do something to propagate forward the current state based on each action. In RL, there is an alternative form of value function that takes in a state and action, but we didn't use that approach here (e.g. Q-learning).

I haven't checked the quality of the learned value function, so I'm honestly not sure how good it'll be. With the particular learning algorithm we used in this paper, the value function was learned sorta as an auxiliary measurement to help train what we really wanted: the policy (see actor-critic methods).

rLoka commented 6 years ago

Hi Michael,

I also have a question concerning safe_actions publisher. I am currently trying to get cadrl_node running for testing purposes and for now, I matched all the input/output topics with my own setup.

The problem, as it seems, is with empty feasible_actions, which, if I understood the code well, should be set by the very publisher on ~safe_actions topic using cbNNActions callback:

    def cbNNActions(self, msg):
        self.feasible_actions = msg

Since it is empty, calling cbComputeActionGA3C fails with Invalid Feasible Actions message.

I presume the publisher that uses NN should be written, but currently I am not sure how would I do that.

Any help is appreciated.

Thanks, Karlo

mfe7 commented 6 years ago

Hi Karlo - I see what's going on. You should be able to run the ROS node completely without feasible_actions, but I forgot to fully remove it. Can you checkout the no_feasible_actions branch I just added, and let me know if that removes the error?

rLoka commented 6 years ago

Hey Michael,

Thanks for being prompt! Switching to no_feasible_actions branch did fix the Invalid Feasible Actions error.

However, another thing emerged. When the new goal is given, via rviz for instance, cadrl_node outputs the following:

Not in NN mode 2

I managed to overcome the error by changing cbGlobalGoal method at line 112 from:

self.operation_mode.mode = self.operation_mode.SPIN_IN_PLACE

to

self.operation_mode.mode = self.operation_mode.NN

Although no further error occurred, nn_cmd_vel publishes only 0 velocity values

---
linear: 
  x: 0.0
  y: 0.0
  z: 0.0
angular: 
  x: 0.0
  y: 0.0
  z: 0.0
---

and no markers (apart from simulated agents) are published either.

Am I doing something wrong?

Thanks, Karlo

mfe7 commented 6 years ago

Hmm, could we try to localize whether the issue is with the ROS code or the network query? On line 496 the line predictions = self.nn.predict_p(obs, None)[0] is the actual network query. Could you see what obs and predictions are? And the next few lines should turn the predictions (policy pdf over actions) into a more readable (speed, heading).

The SPIN_IN_PLACE code was supposed to make it so upon receiving a new goal, the robot stops and spins until it's roughly pointing toward the goal, then switches into NN mode and uses the network queries. This was useful when testing the robot going back and forth across a room, but may not be needed in your scenario.

rLoka commented 6 years ago

Here is the output:

obs: [[0.         5.59617619 1.53353307 1.2        0.5        0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.
  0.         0.         0.        ]]

predictions: [9.1763228e-01 5.9054375e-02 1.0029992e-04 9.9890160e-05 9.9890131e-05
 1.4806822e-02 1.1613100e-03 1.0508312e-04 6.1284434e-03 5.9589912e-04
 2.1576905e-04]

best action index: 0

raw_action: [ 1.         -0.52359878]

action: [ 1.2        -0.52359878]

chosen action (rel angle) 1.2 -0.5235987755982988

Update

It looks like the if self.goal.header.stamp == rospy.Time(0) condition in cbControl method was causing zero twist. From whatever reason the goal header stamp was always zero.

After removing that condition, I got it to work pretty decently. The only thing left to do is adjust the nn_cmd_vel output for my drive type (which is causing the robot to wiggle as can be seen below).

wiggle

Update 2

Using the Jackal simulator for testing seems to give the expected results (no wiggle): no_wiggle

mfe7 commented 6 years ago

Cool thanks for checking that. your observation is coming in, network is selecting 1.2m/s speed and -0.52 radian heading change, so it must be an issue with the method that actually publishes the action to the ROS topic (update_action, I think) On Sat, Aug 25, 2018 at 3:36 AM rLoka notifications@github.com wrote:

Here is the output:

obs: [[0. 5.59617619 1.53353307 1.2 0.5 0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

0.

]]

predictions: [9.1763228e-01 5.9054375e-02 1.0029992e-04 9.9890160e-05 9.9890131e-05 1.4806822e-02 1.1613100e-03 1.0508312e-04 6.1284434e-03 5.9589912e-04 2.1576905e-04]

best action index: 0

raw_action: [ 1. -0.52359878]

action: [ 1.2 -0.52359878]

chosen action (rel angle) 1.2 -0.5235987755982988

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/mfe7/cadrl_ros/issues/3#issuecomment-415948849, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWgpY-IGiUYyW96ipy52F_OEIfNdsVdks5uUP0igaJpZM4U25px .

rLoka commented 6 years ago

I updated my previous post with what I have done to fix it. Thanks for helping. It seems to be working now without problems.

mfe7 commented 6 years ago

Very cool @rLoka !! Thanks for sharing those videos. In the second clip, it still looks a bit jumpy (stop & go) which seems sorta weird.

I suspect this trained policy won't excel in those static obstacle fields, since it was mainly trained on dynamic agents (with a small percentage of static/non-cooperative agents around as well).

rLoka commented 6 years ago

Yeah, it is still not perfect but at least I have it set and running. Will investigate further if the actual drive is causing these stop and go motions or is it up to the trained policy.

DDBarBar commented 6 years ago

@mfe7 @rLoka Hi- I still don't understand how to solve this problem. It‘s:

The problem, as it seems, is with empty feasible_actions, which, if I understood the code well, should be set by the very publisher on ~safe_actions topic using cbNNActions callback:

def cbNNActions(self, msg):
    self.feasible_actions = msg

Since it is empty, calling cbComputeActionGA3C fails with Invalid Feasible Actions message.

Could you tell me how to solve it in details? I will appreciate you very much if you could give me some guidance.

Thanks, Zhangdi

mfe7 commented 6 years ago

@DDBarBar did you try switching branches as noted above? I tried to get rid of everything related to feasible_actions in that branch.

mit-acl / cadrl_ros

safe_actions publisher #3

Update

Update 2