ros / roslisp_common

Clone of the old roslisp_common SVN repo from code.ros.org.
8 stars 14 forks source link

TF2 refactoring #24

Closed airballking closed 8 years ago

airballking commented 9 years ago

This is the announced re-factoring of cl-tf2. Main ideas:

@fairlight1337 @gheorghelisca @bbrieber @yazdani @gaya- @moesenle feel free to review and comment.

airballking commented 9 years ago

Fixes ros/roslisp_common#16 and ros/roslisp_common#19

gaya- commented 9 years ago

This is an interesting implementation of the library, I really like it that it even comes with a little tutorial. However, I can see one flaw in that implementation. I have been trying to point to it previously but it seems that my comments have been misunderstood. So, here is my last try to make myself clear. I had to make sure that the explanation is understandable that's why it turned out so long, sorry about that. Here it goes:

In robotic systems that describe 3D worlds there are usually 2 types of datastructures:

For some algorithms in a robotic system, it does not matter if the coordinates are represented as a pose or poseStamped. E.g., assume we have an algorithm fromAToB which tells you if there exists a path from A to B in general. fromAToB doesn't care about timestamps, it also doesn't want to be bothered with coordinate transformations, so A and B should be specficied in the same frame. Therefore, the signature will be fromAToB(pose A, pose B) and not fromAToB(poseStamped A, poseStamped B). Another example: I need a function that given a certain object will give me poses inside of that object, that function will not want to deal with temporal reasoning either, so the signature will be pose[] posesInsideObject(object obj).

Now assume, we have a whole system and not just a function that works in a 3D world, doesn't care about timestamps and all its datatypes are pose. This system can reason about one frame of robot's belief state and tell if an object is visible for the robot right now, where the robot can stand if it wants to reach an object etc.

Now we want to be able to use this system in a changing world that also has an extra dimension of time and where all the datatypes are poseStamped (call it the 4D world). The simplest way would be to convert all the data from poseStamped to pose each time you enter the 3D world and back into poseStamped when you exit. If you have 100 such entry/exit points you will have to add 100 conversions.

Thankfully, computer science has a paradigm of polymorphism. If we assume that poseStamped is a child of pose and simply extends it with extra functionality we will still be able to pass poseStamped to the 3D world system and it will just ignore the extra functionality. It will still be able to query the same fields of poseStamped that there are in pose (origin and orientation), it will still be able to use exactly the same functions that work with pose even if the actual variable is poseStamped, it can ask if the variable is of type pose and get a positive answer and be happy.

The current implementation of stamped datatypes doesn't support polymorphism of pose and pose-stamped. I can only speak from the side of our system (CRAM) and what I say might not be relevant for others, but in CRAM we utilize the polymorphism of stamped and not stamped datatypes extensively and it is a crucial feature in most of our code (except the core, I trust it that cram_core was written to be usable with any kind of representation of locations / transforms).

Please think about supporting polymorphism of datastructures between cl-transforms and your implementation of cl-tf2 as well.

airballking commented 9 years ago

Thank you for your long comment and the time you have taken to express your use-case!

I walk away with one question: Why does the CRAM spatial reasoning system have so many contact points to TF? Without such an ubiquitous interaction the pain of refactoring would be smaller...

fairlight1337 commented 9 years ago

So, all you're saying is that you want polymorphism because of

If you have 100 such entry/exit points you will have to add 100 conversions.

?

I fail to see how that impacts performance too much (with O(n) and all, see http://bigocheatsheet.com/) :). You won't even have to do them manually, as you can just add a wrapper method to the existing methods that want a pose instead of a pose-stamped.

I like the architecture of the interface. Its clean and allows for extension. We have two contradicting paradigms here (which not only, but also computer science thankfully have given us ;) ):

Since the whole point of the discussion (and hence the refactoring) was to generalize the interface more in order to (and correct me here if I'm wrong) clean up the interface(s) of TF and TF2, not intermixing the two to make code work again that tends to use both, I'm clearly in favor of the second choice. We agreed on creating an abstract interface. Introducing polymorphism for the datatypes of two specializations of this single abstract generalized interface sounds like undermining the initial intention, because it would (definitely) create dependencies between the two, which is simply not what we want.

So I'd say, if there is need to transform between two specialized implementations of this generalized scheme, something else is wrong.

We use pose->pose-stamped all over the place (and I can see that pose-stamped->pose can't be far off, being even simpler in functionality), so why not go for that? And just because we were talking about complexity in the beginning: According to https://github.com/ros/roslisp_common/pull/24/files#diff-69918a620a832f1af50a766d40717bd6R97, you don't even have to convert from pose-stamped to pose, but just read it's pose slot.

Try to be positive about this, it's a big step forward :). Maybe bullet can make use of TF frames itself? Since the infrastructure is there, and you can express all frame-less poses in /map (which you implicitly do anyway), there would be

Also, since you're converting all poses to /map anyway before asserting them into the bullet world, you don't have to do anything you're not doing already.

I think that would serve both, the generality of the overall TF system, and a closer integration between bullet and the rest of the system :). We'd be less likely to run into (partially) breaking changes again as well.

gaya- commented 9 years ago

@airballking

Why does the CRAM spatial reasoning system have so many contact points to TF?

Because it's a robotic system that works a lot with coordinates, namely, the geometric reasoning system works a lot with coordinates and transforms in 3D and the executive part of the system works with coordinates and transforms in 3D and in time. It just uses already existing datatypes of TF to represent stamped transforms and poses instead of defining its own datatypes. That's why I like the idea of having a package cl_tf_datatypes which only contains stamped datatype definitions.

In that sense, CRAM has a lot of contact points to the datatypes that TF uses / defines and not TF itself. Maybe we should just define the stamped datatypes in Common Lisp in the package cl-transforms or an extra package stamped-datatypes. It does not matter than much where the definitions come from (although a separate package would be preferrable) as long as the polymorphism feature is supported. My previous comment had no mention of TF functionality whatsoever, it was only about the datatypes.

@fairlight1337

not multiple versions of TF rummaging around the system

I am not using multiple versions of TF in my code, it's only TF2 at the moment.

I fail to see how that impacts performance too much (with O(n) and all, see http://bigocheatsheet.com/) :). You won't even have to do them manually, as you can just add a wrapper method to the existing methods that want a pose instead of a pose-stamped.

I am already curious about your future pull request. If you will need testers let me know.

fairlight1337 commented 9 years ago

@gaya-, I'm happy with the current one :). If you're not, propose an alternative.

According to what we discussed earlier, this fixes the issues that arose. Your points are new and very specific to the use-case you're presenting. So, if you have an alternative approach, everybody's happy to discuss that as well.

And if you need testers, I bet there are people enough for looking into it.

airballking commented 9 years ago

In offline conversations some of you asked when this will be merged. For the beginning, I'd like to assemble user experiences before going ahead with a merge.

So, my hope for everyone would be:

gheorghelisca commented 9 years ago

I tested the new API of cl-tf2 and worked flawlessly for me. In my opinion you can merge this pull request.

fairlight1337 commented 9 years ago

It compiles fine and seems to be replacing the current functionality just nice (at least for me). I got separate branches for all the CRAM packages that formerly used the ensure-... calls; so if that turns out to work for people, we can merge that in.

I'll test it on the robot tomorrow to verify that it actually does what it is supposed to do.

gaya- commented 9 years ago

I withdraw from this discussion. Which also means that you don't have to consider my objections anymore.

fairlight1337 commented 9 years ago

I tested the implementation and it looks legit so far. API is usable, although I had to add a couple of things for supporting (de-)serialization of geometry_msgs/PoseStamped (see https://github.com/airballking/roslisp_common/pull/1).

What's bugging (haha) me right now is an indirectly related (actionlib) problem, which is already being addressed in https://github.com/ros/roslisp_common/pull/25. The use-case here is perception: Many results with multiple poses get transferred over a topic (in a Designator); those poses need to be transformed into the correct (/map) frame. At some point, this breaks with an error: CL-TF2:TF2-BUFFER-CLIENT-ERROR (Action call did not succeed.).

I tried this:

(defun do-transform-safe (buffer object target-frame &key (timeout 0.0))
  (cpl:with-failure-handling
      ((CL-TF2:TF2-BUFFER-CLIENT-ERROR (f)
         (declare (ignore f))
         (sleep 0.25)
         (cpl:retry)))
    (cl-tf2:do-transform buffer object target-frame :timeout timeout)))

instead of calling cl-tf2:do-transform directly, only to find out that it results in

When attempting to read the slot's value (slot-value), the slot
ERROR is missing from the object NIL.
   [Condition of type SIMPLE-ERROR]

So, somewhere NIL is returned where there should be a value. Sometimes this works and actually produces the transformed poses in question, but only rarely. But that hopefully gets resolved with the pull request mentioned above.

airballking commented 8 years ago

Decided to go with the interface suggested by in https://github.com/ros/roslisp_common/pull/31 because large pieces of internal code already relied on that interface.