ros / roslisp_common

Clone of the old roslisp_common SVN repo from code.ros.org.
8 stars 14 forks source link

Fixes issue with goals with losing goals #25

Closed jannikb closed 8 years ago

jannikb commented 9 years ago

When executing many actions in a quick succession goals could get incorrectly marked as lost. This causes send-goal-and-wait and wait-for-return to return instantly without waiting for the goal to finish.

To replicate this issue run rosrun actionlib_tutorials fibonacci_server and in a repl

(ros-load:load-system "actionlib_lisp" "actionlib-test")
(in-package :actionlib-lisp)
(start-ros-node "foo")
(defparameter *client* (make-simple-action-client "/fibonacci" "actionlib_tutorials/FibonacciAction"))
(let ((time (get-internal-real-time)))
                  (loop for i from 0 to 100 do
                    (send-goal-and-wait *client* (make-message "actionlib_tutorials/FibonacciGoal" order 0) 2 2)                    
                    (unless (and (eql (state *client*) :SUCCEEDED)
                                 (result *client*))
                      (format t "state: ~a~%" (state *client*))
                      (format t "result: ~a~%" (result *client*))))
                  (- (get-internal-real-time) time))
airballking commented 9 years ago

I confirm that the above example code provokes errors with the existing (pre-pull request) implementation.

jannikb commented 9 years ago

Should be fixed now.

fairlight1337 commented 9 years ago

So, can we get this in? I see serious trouble with using high-frequency TF2 transform lookups:

`Action call did not succeed.'

That comes up a lot when using CRAM with cl-tf2 at the moment.

airballking commented 9 years ago

I am working on it, right now...

fairlight1337 commented 9 years ago

Just tried it from Jannik's fork, works nicely.

airballking commented 9 years ago

@jannikb I just tried your tests on our robot and it runs into 1~2 lost goals pretty often. Do you also have this?

jannikb commented 9 years ago

@airballking In my last tests I didn't have that problem anymore. Is there a way I could reproduce it?

airballking commented 9 years ago

I realized I was not testing with the newest version of roslisp which has the fix for ros-time.

After pulling the amount of lost goals is drastically reduced. Still, I get this:

(let ((time (get-internal-real-time)))
  (loop for i from 0 to 5000 do
    (send-goal-and-wait *client* (make-message "actionlib_tutorials/FibonacciGoal" order 0) 2 2)                    
    (unless (and (eql (state *client*) :SUCCEEDED)
                         (result *client*))
      (format t "i=~a~%" i)
      (format t "state: ~a~%" (state *client*))
      (format t "result: ~a~%" (result *client*))))
  (- (get-internal-real-time) time))

[(ACTIONLIB) INFO] 1429266651.608: Reached execute timeout.
i=899
state: LOST
result: [FIBONACCIRESULT
           SEQUENCE:
             #(0 1)]
[(ACTIONLIB) INFO] 1429266664.448: Reached execute timeout.
i=3180
state: LOST
result: [FIBONACCIRESULT
           SEQUENCE:
             #(0 1)]

Maybe we can look at this when we meet this afternoon?

airballking commented 9 years ago

I started playing with the threshold in the code that declares goals as lost:

https://github.com/jannikb/roslisp_common/blob/master/actionlib_lisp/src/new_implementation/goal-manager.lisp#L143

By increasing this number to 30 I could get rid of the lost goals. Decrementing it to 15, 10, 5, 2 gets me to see a lot of lost goals.

So, I guess that threshold is a critical parameter. Could we expose it as a member of the action-client?

fairlight1337 commented 9 years ago

Okay, apparently I've been too optimistic.

[(PR2-NAV PROCESS-MODULE) INFO] 1429274644.050: Using nav-pcontroller.
[(PR2-NAV PROCESS-MODULE) INFO] 1429274644.084: Nav action finished.
[(PR2-NAV PROCESS-MODULE) INFO] 1429274644.086: Navigation finished.
[(ACTIONLIB) INFO] 1429274669.974: Reached execute timeout.
[(ACTIONLIB) INFO] 1429274678.390: Reached execute timeout.

This happens every time I run navigation (via (at-location ...), specifically). I'd be very interested in the exposure of the parameter @airballking talked about.

airballking commented 9 years ago

Note to myself: I started using Jannik's fix for the RoboHow demo on Apr24th. So far, all fine.

airballking commented 8 years ago

Just tested this again and it worked for me. I am merging this now in the hope that we ironed out all issues.