ros2 / ros2_documentation

ROS 2 docs repository
https://docs.ros.org/en/rolling
Creative Commons Attribution 4.0 International
545 stars 1.06k forks source link

tf2 timeout python tutorial references code that doesn't exist #2935

Closed CodinGeoR closed 5 months ago

CodinGeoR commented 2 years ago

Hi.

I've been doing and reviewing all the ros2 tutorials for a while and now I've came across with the tf2 ones. In first place, the Listener tutorial, specifically on the "Writing a listener node" where all the main code relies, I noticed some weird stuff that VSCode later confirmed me:

error As you can see, there is an import to a part of tf2_ros module that doesn't actually exists. Then, there is calls to a variable that is defined later on the code (i.e. first it's called, then defined. Weird). At the end, I'm not sure if all of this makes any difference because the node works as expected and you can do the examples normally.

On second place (and the more absurd one), on the timeout tutorial nothing inside this tutorial makes any sense considering it is using the listener tutorial as a reference. It asks to change a code that is not found on the listener one and to modify things that already are missing in the original one.

This is not the first time I discover weird things in the ros2_documentation and I would like to suggest a review on this topic, specially in the tutorials part, because is where the common user goes first to find answers or goes to learn things in order to develop new projects using ROS.

Hope this can be solved.

clalancette commented 2 years ago

As you can see, there is an import to a part of tf2_ros module that doesn't actually exists. Then, there is calls to a variable that is defined later on the code (i.e. first it's called, then defined. Weird). At the end, I'm not sure if all of this makes any difference because the node works as expected and you can do the examples normally.

As far as I can tell, that part of the module exists just fine. Running locally:

$ python3 -c 'from tf2_ros import TransformException'

We do some slightly non-standard things in the tf2_ros python module, so it may be the case that it is confusing VSCode. So I'm going to ignore this one, as this works as far as I can tell.

On second place (and the more absurd one), on the timeout tutorial nothing inside this tutorial makes any sense considering it is using the listener tutorial as a reference. It asks to change a code that is not found on the listener one and to modify things that already are missing in the original one.

Yes, agreed. That tutorial needs a review and a rewrite. I'm going to update the title of this issue to reflect that. If you have time to look into it and open a PR, it is much appreciated.

This is not the first time I discover weird things in the ros2_documentation and I would like to suggest a review on this topic, specially in the tutorials part, because is where the common user goes first to find answers or goes to learn things in order to develop new projects using ROS.

Sorry, this comment isn't very helpful. If you have specific issues with the tutorials, please feel free to open up issues. But a general "I found weird stuff" doesn't really help us solve anything.

CodinGeoR commented 2 years ago

Oh, hey @clalancette. Thanks for taking the time to review this.

Yes, agreed. That tutorial needs a review and a rewrite. I'm going to update the title of this issue to reflect that. If you have time to look into it and open a PR, it is much appreciated.

Well, I would love to but I'm not sure how. I mean, I've been doing tests on the turtle_tf2_listener.py locally and nothing is working so far. The only relevant thing that I discovered so far is that there's a difference between using the line where variable now is defined as now = rclpy.time.Time() and now = self.get_clock().now() (as here is suggested). The difference that I'm talking about is on the clock_type that is sent by the listener itself: With the first option the output is clock_type=SYSTEM_TIME and with the other one is clock_type=ROS_TIME.

Also, the duration of the timeout (when included) seems to be taken into consideration only when the ROS_TIME is used. In that scenario, it never works no matter how many or how little time you put, resulting in the listener incapable of transforming turtle2 to turtle1 saying "Lookup would require extrapolation into the future".

Not sure how to proceed from there.

Sorry, this comment isn't very helpful. If you have specific issues with the tutorials, please feel free to open up issues. But a general "I found weird stuff" doesn't really help us solve anything.

I'm sorry, that was me trying to conclude the issue with a complementary opinion. I've been doing separate issue threads with the things I've been founding in here though. But yeah, excuse me again for the commentary.

clalancette commented 2 years ago

Also, the duration of the timeout (when included) seems to be taken into consideration only when the ROS_TIME is used. In that scenario, it never works no matter how many or how little time you put, resulting in the listener incapable of transforming turtle2 to turtle1 saying "Lookup would require extrapolation into the future".

Not sure how to proceed from there.

Yeah, that's fine. When I reviewed it a couple of months ago, that's where I got to as well. There should be a way to make it work, I just haven't had time to come back and figure out what the tutorial is actually trying to accomplish.

I'm sorry, that was me trying to conclude the issue with a complementary opinion. I've been doing separate issue threads with the things I've been founding in here though. But yeah, excuse me again for the commentary.

No worries, it happens. We rely a lot on the community to find and fix issues in the documentation, so please do feel free to keep opening issues as you find them. Thanks for this report.

CodinGeoR commented 2 years ago

Yeah, that's fine. When I reviewed it a couple of months ago, that's where I got to as well. There should be a way to make it work, I just haven't had time to come back and figure out what the tutorial is actually trying to accomplish.

As far as I understand, the tutorial is trying to teach how the timeout on lookup_transform works and why is useful. In this case, making something like a delay in order to give the buffer some time to reach the proper response for the listener, in order to take the broadcasted messages as quick a possible and act. Almost like trying to sync up those two I think. I also think the same as you. There might be a way to make it work, mostly because in cpp there's no issue with it. We'll figure it out eventually. Thanks for the time! ^^

CodinGeoR commented 2 years ago

Update: @clalancette, I just finished studying different outputs on the console log with some experiments I've been doing and now I want to share the results I found so far:

The intention of this whole thing is to use ROS_TIME as the main clock_type for the time argument for the lookup_transform. So, it is needed to declare the variable like this:

now = self.get_clock().now()

From there, we can call the buffer.lookup_transform like this (without a timeout):

trans = self.tf_buffer.lookup_transform(to_frame_rel, from_frame_rel, now)

This will lead to the exception we were talking about. That exception gives some timestamps we can study and this is why I'm writing this now. The first thing I did was to subtract both times for several exceptions looking for a pattern. Some of the results were:

1660301168.779097−1660301168.775413 = 0.0037
1660301169.779227−1660301169.767416 = 0.0118
1660301170.779079−1660301170.775408 = 0.0037
1660301171.779207−1660301171.767603 = 0.0116
...

As you can see, the results oscillate from two values and I don't have an explanation to that (yet). From the exception we can also understand that those values correspond to the time the user asks on the lookup_transform and the timestamp where the last value available is located, respectively. This means there's a de-sync on the listener, of course.

The next thing I did was to include the timeout with 50ms like so:

trans = self.tf_buffer.lookup_transform(to_frame_rel, from_frame_rel, now, timeout=rclpy.duration.Duration(seconds=0.05) )

As since the exception was still there, I did the same thing as before:

1660301790.540302−1660301790.531833 = 0.0085
1660301791.540418−1660301791.523916 = 0.0165
1660301792.540555−1660301792.531792 = 0.0088
...

It is clear that the oscillation continues but now the values are different. If you do the math, this is because of those extra ms I put in the timeout. Now if we go back where I explain what those values means, it then makes no sense to add those ms to the time the user is asking for, does it? So I changed the code like this:

now = self.get_clock().now() - rclpy.time.Duration(seconds=0.05)
trans = self.tf_buffer.lookup_transform(to_frame_rel, from_frame_rel, now)

And surprise! It works like a charm now. So, I don't wanna create a commit to it because I'm not sure where the error is in here (maybe here with that '+' sign or something?), but with this I think I can assure that the issue is somewhere in the lookup_transform and/or the timeout is not working.

Hope this help! Please let me know.

clalancette commented 5 months ago

We "fixed" this by removing the tutorial in https://github.com/ros2/ros2_documentation/pull/4384, so closing.