ouster-lidar / ouster-ros

Official ROS drivers for Ouster sensors (OS0, OS1, OS2, OSDome)
https://ouster.com
Other
125 stars 149 forks source link

Non-monotonic time results in damaged `t` field of pointclouds #46

Open peci1 opened 1 year ago

peci1 commented 1 year ago

This code and some other similar instances

https://github.com/ouster-lidar/ouster-ros/blob/b5393ca9ceb12823ddc78828c0326f0d32af1c51/src/os_ros.cpp#L151-L157

makes an assumption about monotonically increasing timestamps in the lidar packets. However, that is not guaranteed.

For example, we discipline Ouster clock via PTP automotive profile which can move the clock pretty fast, including discrete time jumps in both directions.

What happens in the driver code is that the scan timestamp is taken as the first timestamp in the scan. However, subsequent timestamps may be smaller, the subtraction yields negative numbers, and the static_cast<uint32_t> returns values close to the max representable uint32_t as it underflows, which is somewhere around 4e9. That would mean some points in the scan were captured about 4 seconds after the time in header.stamp, which is of course nonsense.

Here is an example plot of the timestamps in a single scan that was affected by this bug:

Snímek z 2023-01-13 12-15-45

There are several possibilities what can be done with such scans:

1) The simplest thing would be computing the scan timestamp as the minimum value of all timestamps - that would ensure that the subtraction result is always non-negative, however the timestamps would remain non-continuous and non-monotonic. 2) Another possibility is discarding the whole scan as damaged. 3) The best option could be trying to repair the timestamps. This is what I have implemented in our custom driver - figure out if there is only a single discrete jump in the timestamps, and if there is, offset the first n invalid values by the jump size, which would make the timestamps again continuous. The scan start time has to be recomputed afterwards.

I can imagine different use-cases will have different preferences about what to do in this case.

If you'd like, I can share the fix I did in our custom driver. However, first it would be good to figure out which approaches should be implemented in this driver.

Samahu commented 1 year ago

The way I understood the issue is that it only happens when the sensor is connected to a PTP clock and the clock is making some adjustments? Right? I don't think I have seen a case in our packets where the time of valid packets on a standalone sensor (i.e. without a PTP master) having timestamps that is not constantly increasing.

I do have some experience with PTP clocks and I have observed adjustments taking place but I didn't think it could - substantially - affect the timestamps within a single frame to the degree you report.

peci1 commented 1 year ago

It apparently can happen :) And yes, we use PTP automotive profile. I'll try to record a bag file the next time it happens.

It might be caused by our a bit complicated setup where the time master of Ouster syncs itself against another grand master. I suspect the time sync might get into some oscillations in some cases which then cause this behavior.

PhilippSchmaelzle commented 1 year ago

If it is ok I would like to join in this discussion :)

Line 185-186 in the current master does some std::min() action. But as the time within one scan increases it should output always the time of scan_ts <- first column.

Shouldn't t within the point field contain the delta from time of the first valid column until the current column?

e.g.:

Samahu commented 1 year ago
  • if I do a full 360° swipe ts = std::chrono::nanoseconds(timestamp[v] - timestamp[0])
  • and if I do 90° to 180° ts = std::chrono::nanoseconds(timestamp[v] - timestamp[256]) //but only for range 256 to 768 as all other columns do not contain valid stamp?

I agree I thought that would be the case, but I think our firmware doesn't behave this way and still fills the time and sends packets for all the ~frames~ columns. I would say though that this is a different topic from non-monotonic time increase discussed here.

I saw your PR which addresses this problem and I do intend to take a look at it but definitely not before having #67 merged first.

PhilippSchmaelzle commented 1 year ago

Sry, I rushed into this topic as my timestamps (PTP) looked off and I linked it to the uint64 to uint32 casting action. But now everything seems to be ok again. So indeed unrelated to this issue.

Nevertheless, my testing with FW2.4. showed, that if one has configured some azimuth FoV the timestamps outside of this range are 0.0. Which is fine for me but it is different than you stated.