ros-industrial / ros_canopen

CANopen driver framework for ROS (http://wiki.ros.org/ros_canopen)
GNU Lesser General Public License v3.0
343 stars 274 forks source link

ros socketcan_interface misses messages that candump can see. #448

Open faultfactory opened 3 years ago

faultfactory commented 3 years ago

Hello,

18.04 melodic, installed via apt, verified we're up to date as of writing .

I am currently using the socketcan_interface to re-assemble object lists and other data groupings from a vehicle can network to pass to ROS as a prototyping platform for autonomous vehicle work. Some of this comes from vehicle can networks, some of this comes directly from Adas sensors but the struggle here is dropped messages at the ROS layer.

As an example, let's say the format is as follows with each message being unique:

<header> -> Defines the number of objects being transmitted. 
<object 1 info A> 
<object 2 info A>
..
<object n info A>
<object 1 info B> 
<object 2 info B>
..
<object n info B>

They get assembled into an object list with all the parts and published to ros.

Or we're taking various vehicle state signals and packaging them into a single rosmsg and publishing them on a topic. End result is we're collecting a specific # of can msgs, converting to engineering units from hex/binary and assembling a ros message from them. In many (nearly all) cases the receipt order and the completeness of the full package matters.

As a template for building these nodes we started with the socketcan_to_topic node (and the corresponding .cpp) which uses the can::ThreadedSocketCANInterface. When running this, we would often can messages that were dropped.

In order to get this running in a more reliable manner I had to convert to using the candump.cpp file as a template where we are using can::SocketCanInterface (which i assume is a non-threaded version) and the driver.run() command which forces the program to be shutdown via ctrl+c. This does capture nearly everything and runs significantly more reliable but it does consume a significantly higher amount of resources. I have looked high and low for people with the same issue but I'm not seeing it. I've done some tests with this and I've found the following:

  1. It is not dependent on the hardware interface. This has occurred using PEAK-USB can adapter on a , Raspberry pi can hat AND canplayer to vcan interface.
  2. All of the traffic I need to have a complete message is making it to the interface which i can verify via candump (can-utils, not the candump in this repo).
  3. If the testing in the debugger holds, potentially this can happen on the very first message (meaning no queue or overflow yet). The first message that reaches enqueue is not the first message in the candump log i am playing back to the interface the node is attached to.

If anyone has an idea how i can reduce the dropped messages coming into this node, it would be very helpful. I don't believe any of our data conversion code is specifically high consumption so i would be very surprised if this was it. With the driver.run() method we do have an issue with throughput if we have more processes running on that same piece of hardware. As a result I think our next step is to move the can content off of the main PC and onto some other smaller machiens running ROS but this just seems like it's not such a high throughput that this should be necessary.

mathias-luedtke commented 2 years ago

but the struggle here is dropped messages at the ROS layer.

It is the first time that such an issue has been reported. At which frequency are your messaged coming in?

Do you see that loss as well, if you try it with socketcan_bridge?

All of the traffic I need to have a complete message is making it to the interface which i can verify via candump (can-utils, not the candump in this repo).

Did you try it with candump in this repo?

where we are using can::SocketCanInterface [...] This does capture nearly everything and runs significantly more reliable but it does consume a significantly higher amount of resources.

Performance-wise there should not be much of a difference between the threaded and the non-threaded version. The first just runs the latter in a background thread. In both version there is one additional background thread for Boost ASIO anyway.

End result is we're collecting a specific # of can msgs, converting to engineering units from hex/binary and assembling a ros message from them. In many (nearly all) cases the receipt order and the completeness of the full package matters.

I believe that this might be the root cause of your troubles. Please make sure that you never block any of the listener callbacks (especially, if messages are coming in fast). socketcan_interface does not do any input buffering at all. However, the stack can read one message from the bus while the previous message is being processed. This helps to deal with sporadic blocks.

You could use BufferedReader to decouple your user code properly.

faultfactory commented 2 years ago

It is the first time that such an issue has been reported. At which frequency are your messaged coming in?

It's all automotive can so it's 500kbps and the messaging frequency varies.. and the hardware does it's own buffering i believe at the USB interface. I don't know the exact timing but if i look at 1 second of the log i'm getting 2600 msgs on one bus but i'm taking action on a specific subset of that via the FilteredFrameListener. I actually am getting decent performance with the SocketCanInterface and driver.run() approach. The compute load just seems high. Only occasional missed frames.

Do you see that loss as well, if you try it with socketcan_bridge? SocketCanBridge does not appear to have that issue which is what caused me to use that as my framework initially.

Did you try it with candump in this repo?

Yes, that does work. Which is why I then took the driver.run() approach from that with the SocketCanInterface vs ThreadedSocketCanInterface.

Performance-wise there should not be much of a difference between the threaded and the non-threaded version. The first just runs the latter in a background thread. In both version there is one additional background thread for Boost ASIO anyway.

Interesting, I thought that Boost ASIO would have been the culprit. Sounds like it's not the case.

End result is we're collecting a specific # of can msgs, converting to engineering units from hex/binary and assembling a ros message from them. In many (nearly all) cases the receipt order and the completeness of the full package matters.

I believe that this might be the root cause of your troubles. Please make sure that you never block any of the listener callbacks (especially, if messages are coming in fast). socketcan_interface does not do any input buffering at all. However, the stack can read one message from the bus while the previous message is being processed. This helps to deal with sporadic blocks.

Our callbacks are quite small. When profiling the code, the conversion library code functions made up 10% of the load vs the Boost ASIO and interface stack took up the remaining majority. I suppose the publish could be the culprit since we are not using ros::spin() but i have not looked at that. We are using the Vector DBC library from Toby Lorenz to do our conversions but the library is very efficient. I would be surprised if this was the issue.

You could use BufferedReader to decouple your user code properly.

This looks like exactly what I need.

Do you know if this buffered reader would respect the CAN message order (FIFO)?

Do you have any examples of this being used that I could take a look at? Would I be using in place of the SocketCanInterface or alongside? Would it matter if i used the threaded interface or not?

mathias-luedtke commented 2 years ago

Our callbacks are quite small.

This does not mean that they cannot block ;) But the issue is more the grouping of the CAN frames.

Do you know if this buffered reader would respect the CAN message order (FIFO)?

Yes. Actually, this is guaranteed for all callbacks.

Do you have any examples of this being used that I could take a look at?

https://github.com/ros-industrial/ros_canopen/blob/eaaf0a17b928db679215ddc9c3da1daaa6aa656c/canopen_master/src/master_plugin.cpp#L95-L115

Would I be using in place of the SocketCanInterface or alongside? Would it matter if i used the threaded interface or not?

This is independent from the actual CAN interface. It replaces one FrameListenener.

but i'm taking action on a specific subset of that via the FilteredFrameListener.

FilteredFrameListener is not optimized for performance! Every FilteredFrameListener will process all messages from the bus, pass the matches and then just drop the rest.

https://github.com/ros-industrial/ros_canopen/blob/eaaf0a17b928db679215ddc9c3da1daaa6aa656c/socketcan_interface/include/socketcan_interface/filter.h#L49-L66

How many are you using?

It might be faster, if you just write your own filter in one callback (or using a BufferedReader). Just run the can::Frame trough your can::FrameFilter instances and take the corresponding action.

mathias-luedtke commented 2 years ago

It might be faster, if you just write your own filter in one callback (or using a BufferedReader). Just run the can::Frame trough your can::FrameFilter instances and take the corresponding action.

Or just create a FrameListenerfor every single frame. You could use the same callback for each certain group. The FrameDispatcher uses a std::unordered_map, which will be way faster than the FilteredFrameListener.

https://github.com/ros-industrial/ros_canopen/blob/1acd49381d114f4444122732ba53370af621761b/socketcan_interface/include/socketcan_interface/dispatcher.h#L88

faultfactory commented 2 years ago

How many are you using?

Sorry, this question isn't clear. What are you referring to?

Or just create a FrameListenerfor every single frame. You could use the same callback for each certain group. The FrameDispatcher uses a std::unordered_map, which will be way faster than the FilteredFrameListener.

Would the key in this case be can ID?

I will do some experimentation today and see if there's an improvement. Thanks for the help. If you need to close the issue, feel free but it would be good to continue documenting for others if that's okay with you.

mathias-luedtke commented 2 years ago

Sorry, this question isn't clear. What are you referring to?

How many instances of FilteredFrameListener are you using in your code?

Would the key in this case be can ID?

Yes. It is the CAN ID including the high bits for RTR and extended CAN IDs.

faultfactory commented 2 years ago

How many instances of FilteredFrameListener are you using in your code?

Only one. All the can traffic flows through that.

mathias-luedtke commented 2 years ago

Only one. All the can traffic flows through that.

Then the performance should be okay.

faultfactory commented 2 years ago

Edited: Putting more than one question in.

I'm new to multi-process and there's a lot of templating here. I thought I was decent at C++ but I'm being humbled by this library. I might need a little more spoon feeding.

I looked over the BufferedReader approach. I don't see any callback assignment like the other FrameListener classes so if I'm understanding correctly, the call to readUntil sits in your while loop condition and writes to a pointer you supply until the bool returns false. So I would on occasion need to manually trigger this while loop to consume the buffered messages with a ros::Timer event? Does this make sense?

I've looked at the multiple FrameListener approaches but I think that may end up being more complicated as not all of the devices I need to ingest data from fall into the format I listed. It varies a bit. For each one of these I make a single callback message that takes the can::Frame and does a switch/case on the ID. Not the best approach but it's close enough to working correctly that the decoupling alone might be enough to keep me working correctly and allow me to have a uniform format for these nodes that I have to create. New content comes in and I have to change them so the simpler the better.

I see this being used a lot but can't figure out it's purpose. What does the class_loader do for this package?

faultfactory commented 2 years ago

I'm finding that any solution I attempt using the ThreadedSocketCanInteface is performing significantly worse than the non-threaded version. I don't know how I am able to use the non-threaded version with with the BufferedReader

My next step is to examine the build scheme for differences between the base install and mine.

mathias-luedtke commented 2 years ago

looked over the BufferedReader approach. I don't see any callback assignment like the other FrameListener classes so if I'm understanding correctly, the call to readUntil sits in your while loop condition and writes to a pointer you supply until the bool returns false.

Exactly.

So I would on occasion need to manually trigger this while loop to consume the buffered messages with a ros::Timer event? Does this make sense?

I would not use a timer, but just use read all messages in a loop and process them. Actually, you might not even need BufferedReader in this case.

I'm finding that any solution I attempt using the ThreadedSocketCanInteface is performing significantly worse than the non-threaded version.

Again, there is not much of a difference, except for one additional thread.

https://github.com/ros-industrial/ros_canopen/blob/1acd49381d114f4444122732ba53370af621761b/socketcan_interface/include/socketcan_interface/threading.h#L39-L43

I still believe that your user code does something, which might not work well with that many threads (or the way they are used in ros_canopen). Of course, there could be a bug in ThreadedInterface, but I doubt that for now. It is has been used for years in socketcan_bridge and a similar technique is used in the other canopen packages.

I see this being used a lot but can't figure out it's purpose. What does the class_loader do for this package?

It is part of the (optional) plugin system: http://wiki.ros.org/class_loader

I've looked at the multiple FrameListener approaches but I think that may end up being more complicated as not all of the devices I need to ingest data from fall into the format I listed. It varies a bit. For each one of these I make a single callback message that takes the can::Frame and does a switch/case on the ID. Not the best approach but it's close enough to working correctly that the decoupling alone might be enough to keep me working correctly and allow me to have a uniform format for these nodes that I have to create. New content comes in and I have to change them so the simpler the better.

Since you have to do the switch/case anyway, I would suggest to only use a single callback. The current listener interface supports single ID matches only and if you need other selection criteria (ranges, masks) you have to implement your custom solution.