How to sync messages timestamp between LGSVL sensors and Autoware

pedroexenberger commented 4 years ago

Hi all. I'm facing a problem while running LGSVL (20.01) with Autoware (1.12.0), which I descibe above. I would really appreciate any kind of help here.

Summary of the problem: Basically the timestamp from lidar (velodyne), published in the /points_raw topic is increasingly behind the ROS time. This leads to problems in some modules I want to use.

Question: Is there a way to synchronize the timestamp generated through the LGSVL simulated LiDAR (that publishes on /points_raw) with the actual ROS time?

Details:

I'm trying to run the autoware tracking module imm_ukf_pda_tracker. The module depends on a transform by /tf between world and velodyne. Basically, the tf tree needs the path world->map->base_link->velodyne.

One of the transformations within this path (namely map->base_link) depends on /ndt_matching. ndt_matching depends on voxel_grid_filter which depends on points_raw. There is a chain of subscriptions and callbacks that make data from points_raw to reach ndt_matching.

The problem is that points_raw publishes the messages with a timestamp behind ROS (worse: increasingly behind as simulation goes by). This is asserted, for example, by looking at the timestamp of the points_raw header topic with topic_monitor, and comparing it with ROS time displayed in RViz. If I wait sufficient time, the difference reaches hundreds of seconds.

This leads to the following problem when running roswtf:

WARNING Received out-of-date/future transforms:
 * receiving transform from [/ndt_matching] that differed from ROS time by -105.362743684s

Also, it makes the transformation between world and velodyne (necessary for imm_ukf_pda_tracker) to fail, with errors like the above:

[ERROR] [1582640028.598171054]: Lookup would require extrapolation into the past.  
Requested time 1582640001.163042816 but the earliest data is at time 1582640118.607235113, 
when looking up transform from frame [velodyne] to frame [world]

So, since the points_raw timestamp comes from the LGSVL simulated LiDAR, I wonder if you know how can I make both LGSVL and ROS to keep the same pace and to have some kind of synchrony between them, so that timestamps are near eachother. That would be really helpful!

In time, both ROS and LGSVL are running in the same machine, so they should share the system time. In fact, ROS is running inside a docker container (in the same machine), but I don't know if that can mess up the timing. At least running date '+%s' in both the host and docker shows that their clocks are in sync.

I thank you very much in advance and I can provide more details if necessary. Cheers.

pedroexenberger commented 4 years ago

I think I solved it.

Apparently I had an issue with the rosbridge. It was not capable of sending data in the required rate.

I then tried changing the Bridge Type from ROS to ROS2 when configuring the cars, on the Vehicle tab in the LGSVL UI (on the browser). Surprisingly the bridge was still working and perfectly in sync with ROS time. So I'm sticking with this config even though I'm using ROS (and not ROS2).

Cheers.

pedroexenberger commented 4 years ago

Ok, so I was naive. ROS2 bridge can send data to autoware, which allows me to perform tracking. But I was doing that with the car stopped.

Today I was proceeding my tests and noticed that the car does not follows the path when i try to drive following the vector map. If I go back to the ROS rosbridge (instead of ROS2), it starts driving.

So I noticed that the topic types for the /vehicle_cmd topic (which the lgsvl simulator subscribes to) changes between rosbridge versions, which is expected. Autoware publishes _autowaremsgs/VehicleCmd and ROS2 rosbridge expects for _autoware_automsgs/RawControlCommand.

But now I ended up with two possibilities:

I can use ROS version of rosbridge, but it delays the packages and prevent me to use the whole autoware stack (such as tracking, as explained in the first post of this issue).
Use ROS2 version of rosbridge, which provides data fast enough for tracking, but have some topics with different types. This also restricts me to use the whole autoware stack (motion in this case).

Given that, is there a workaround for increasing the performance of rosbridge version 1 (ROS) between LGSVGL and Autoware? Or a manner to adapt rosbridge version ROS2 to have the same interface as the ROS1 version?

Thank you!

EricBoiseLGSVL commented 4 years ago

@pedroexenberger we are investigating an issue with ROS. I think you are having a similar issue. We will continue to debug this. Any findings please post here

YiruLyu commented 4 years ago

Hi, I am facing the same problem, but using Autoware V1.13. I think it might be possible to edit the Autoware source code, just to remove the "LookUpTransform" function related lines so as to let it continue running the code. Up till now it is working fine with me. But I only tried this with "ray_ground_filter", which do this timestamp check in Autoware V1.13 but not in V1.12. I guess this sync issue could cause more problem for other Autoware functions. Looking forward to a better solution.

pedroexenberger commented 4 years ago

Thank you @EricBoiseLGSVL and @YiruLyu for your input. It is a relief to know I'm not the single one facing this.

I'm taking a look at the LGSVL simulator source code, searching up the differences between ROS and ROS2 bridges. Since publishing the point cloud with ROS2 worked with Autoware, I thought it would be a good starting point to understand why Rosbridge performs well under ROS2 but not under ROS. Apart from different conversion functions, it appears to me that the difference relies on the format of the messages. Is that it? Could it be the case that ROS2 simply uses a protocol or format that lightens the amount of data exchanged?

Anyway, I'll try to set up the building environment for the LGSVL simulator so I can try out some alternatives. I'll let you know if I find anything useful.

@YiruLyu I'm not sure if something like ignoring the LookUpTransform is the best solution for me because as time goes by the resolution of the ndt_matching would be penalized (at least in my case). Even though, thanks for letting me know your approach.

YiruLyu commented 4 years ago

@pedroexenberger Find a solution for my system now! Please consider this:

Add in the json of the vehicle in LGSVL browser to publish a "/clock" topic. You can use https://www.lgsvlsimulator.com/docs/sensor-json-options/ .
Set the parameter "/use_sim_time" to True for autoware. I did this in terminal by just run "ros param set /use_sim_time True", then the RViz ros time will follow the /clock topic which is same as the sensors. Hope this also solves your issue.

pedroexenberger commented 4 years ago

@YiruLyu, thank you very much for sharing this solution!

Do you know if, in this case, autoware is still running in 'real-time'?

I wonder if the LGSVL /clock is being sended (and also delayed) by the rosbridge, autoware would be receiving and proceessing messages in a slower pace and slower clock, right? (I mean, two messages 1 sec apart in real life would not be 1 sec apart when reaching autoware, because of the bridge slowdown).

I just wonder if that wouldn't relax the time constraints the nodes in autoware have to meet. One of my goals is to profile the software stack, and I don't want to mess up with the computational effort by slowing down the pace of messages. Do you have any thoughts on that?

At the same time, I had some issues while setting up the building environment but now it's working. So I'll start debugging the code to see if I can understand this better.

pedroexenberger commented 4 years ago

Ok, an update from my side (perhaps a solution).

I focused on adapt the interface of ROS2 in the LGSVL code, so it could recognize the vehicle_cmd topic (and drive the car), as explained in https://github.com/lgsvl/simulator/issues/653#issuecomment-593384836. I did this since using ROS2 as the bridge of the simulated car seemed to publish lidar data at the required speed to autoware.ai (even though ROS should be the apropriate version).

I changed a few lines in the LGSVL simulator code, namely in the file Assets/Scripts/Bridge/Ros/RosBridge.cs

Basically, did 2 changes:

1) ignored Version 2 in the AddReader Function (which creates the subscriber I guess), so it would read data published by the ROS1 version used by autoware.ai when collecting data to control the car:

public void AddReader<T>(string topic, Action<T> callback) where T : class
       ...
            else if (type == typeof(VehicleControlData))
            {
                if (Apollo)
                {
                    type = typeof(Apollo.control_command);
                    converter = (JSONNode json) => Conversions.ConvertTo((Apollo.control_command)Unserialize(json, type));
                }
                /* >>>>> COMPLETELY REMOVED THIS ELSE IF <<<<<<
                else if (Version == 2)
                {
                    // Since there is no mapping acceleration to throttle, VehicleControlCommand is not supported for now.
                    // After supporting it, VehicleControlCommand will replace RawControlCommand.
                    // type = typeof(Autoware.VehicleControlCommand);
                    // converter = (JSONNode json) => Conversions.ConvertTo((Autoware.VehicleControlCommand)Unserialize(json, type));

                    type = typeof(Autoware.RawControlCommand);
                    converter = (JSONNode json) => Conversions.ConvertTo((Autoware.RawControlCommand)Unserialize(json, type));
                }
               */
                else
                {
                    type = typeof(Autoware.VehicleCmd);
                    converter = (JSONNode json) => Conversions.ConvertTo((Autoware.VehicleCmd)Unserialize(json, type));
                }
            }
          ...

2) used the ROS1 "protocol" to collect time data inside UnserializeInternal function:

            else if (type == typeof(Time))
            {
                var nodeObj = node.AsObject;
                var obj = new Time();
                /*   >>>>> IGNORE VERSION, ALWAYS USE VERSION 1 <<<<<<
                if (Version == 1)
                {
                    obj.secs = uint.Parse(nodeObj["secs"].Value);
                    obj.nsecs = uint.Parse(nodeObj["nsecs"].Value);
                }
                else
                {
                    obj.secs = int.Parse(nodeObj["sec"].Value);
                    obj.nsecs = uint.Parse(nodeObj["nanosec"].Value);
                }*/
                // Using version 1 data acquisition even though I'm setting ROS2 in the car bridge when running LGSVL 
                obj.secs = uint.Parse(nodeObj["secs"].Value);
                obj.nsecs = uint.Parse(nodeObj["nsecs"].Value);

                return obj;
            }

Build the project and started the simulator, setting the bridge version to ROS2 on the simulated car (even though the subscription now handles it as a ROS1 after my code modifications). With that, the car could both send lidar data at the required frequency, as also receive commands from autoware to drive. Mostly solving the problem.

However there are some problems left:

Obviously it breaks compatibility for those who want to use LGSVL to both autoware.ai and autoware.auto. So I don't know why this adapted version of ROS2 works better than ROS.
The ndt_matching seems to be more unstable, sometimes the car is getting lost while driving. This used to happen before, but in more rarely occasions. Perhaps the increased message frequency after the correction made my PC struggle to run all nodes at the same time (I'm running LGSVL + autoware in the same machine). Or the bridge might not be working a 100% correct because I'm using ROS2 instead of ROS1 and there might be unexpected consequences that I don't know.

I hope this help you investigate the problem better. I'm struggle a bit to debug the application since it is multi-thread and VSCode get lost. Also, you have much more expertise in the project.

If you have any ideas on how to improve the issues I mentioned or any other feedback, I thank you very much.

lgsvl / simulator

How to sync messages timestamp between LGSVL sensors and Autoware #653