ros-drivers / flir_camera_driver

163 stars 154 forks source link

[ROS2] blackfly S inconsistent startup, crashes #149

Closed buckleytoby closed 3 months ago

buckleytoby commented 8 months ago

Using ros2 Iron, Ubuntu 22.04 I've found that the spinnaker_camera_driver > camera_driver_node won't be able to startup for various reasons. Re-running the same launch file multiple times without any hardware change results in different outcomes.

The following was observed after confirming that both cameras were recognized by the OS by checking lsusb

Failure modes: 1) Ctrl+C'ed out of camera_driver_node, re-run camera_driver_node

[camera_driver_node-2] [INFO] [1706141848.665010541] [rightcam]: using spinnaker lib version: 3.1.0.79
[camera_driver_node-2] [INFO] [1706141848.665354033] [rightcam]: found camera with serial number: 22131605
[camera_driver_node-2] terminate called after throwing an instance of 'Spinnaker::Exception'
[camera_driver_node-2]   what():  Spinnaker: Could not read the XML URL [-1008]
[ERROR] [camera_driver_node-2]: process has died [pid 43657, exit code -6, cmd '/xxxxxxxxxxx/ws_flir/install/spinnaker_camera_driver/lib/spinnaker_camera_driver/camera_driver_node --ros-args -r __node:=rightcam --params-file /tmp/launch_params_n7h8jj9b --params-file /tmp/launch_params_r6l4oe_s --params-file /tmp/launch_params_cctft8hc'].

2) Ctrl+C'ed out of camera_driver_node, re-run camera_driver_node

[camera_driver_node-2] [WARN] [1706141891.574435237] [rightcam]: no camera found with serial: 22131605 on try # 4
[camera_driver_node-2] [WARN] [1706141891.574478764] [rightcam]:  found cameras: 22129738
[camera_driver_node-2] [WARN] [1706141891.574483294] [rightcam]:  found cameras: unknown
[camera_driver_node-2] [WARN] [1706141892.574801001] [rightcam]: no camera found with serial: 22131605 on try # 5
[camera_driver_node-2] [WARN] [1706141892.574851164] [rightcam]:  found cameras: 22129738
[camera_driver_node-2] [WARN] [1706141892.574855501] [rightcam]:  found cameras: unknown
[camera_driver_node-2] [ERROR] [1706141893.574941976] [rightcam]: giving up, camera 22131605 not found!
[camera_driver_node-2] [ERROR] [1706141893.575007708] [rightcam]: startup failed!
[camera_driver_node-2] terminate called after throwing an instance of 'std::runtime_error'
[camera_driver_node-2]   what():  startup of CameraDriver node failed!
[ERROR] [camera_driver_node-2]: process has died [pid 43804, exit code -6, cmd ...

I can always kill the node and rerun it, or unplug and plug in the camera, but I was hoping there was a more reliable startup method for the flir cameras.

Thanks.

berndpfrommer commented 8 months ago

OK, so just to clarify: you run the ros driver, you ctrl-c out of it, and immediately (or after some time?) try to start it up again and it fails with above error messages? If that is so, then: 1) It cannot be bad state in the ros driver, since the ros driver has exited 2) it must be that the camera is left in a bad state once the driver exits. Do you agree?

buckleytoby commented 8 months ago

OK, so just to clarify: you run the ros driver, you ctrl-c out of it, and immediately (or after some time?) try to start it up again and it fails with above error messages? If that is so, then:

1. It cannot be bad state in the ros driver, since the ros driver has exited

2. it must be that the camera is left in a bad state once the driver exits.
   Do you agree?

Yes, agreed. I'll rerun it only a few seconds after ctrl+c'ing out

berndpfrommer commented 8 months ago

OK, we are probably not shutting down the camera as nicely as we should. And the Spinnaker SDK probably also doesn't. Did you keep the log messages on exit? What is on the console when you exit the driver before you see the startup problems? Can you predict from the console messages if the driver will start up next time?

buckleytoby commented 8 months ago

here's the last few lines of a run right before the failure:

[camera_driver_node-2] [INFO] [1706141828.063117249] [rightcam]: rate [Hz] in  30.00 out  30.00 drop   0%
[camera_driver_node-1] [INFO] [1706141828.091431317] [leftcam]: rate [Hz] in  30.00 out  30.00 drop   0%
^C[WARNING] [launch]: user interrupted with ctrl-c (SIGINT)
[camera_driver_node-2] [INFO] [1706141828.690111214] [rclcpp]: signal_handler(signum=2)
[camera_driver_node-1] [INFO] [1706141828.690110471] [rclcpp]: signal_handler(signum=2)
[ffmpeg_transform_bundler-3] [INFO] [1706141828.690111918] [rclcpp]: signal_handler(signum=2)
[INFO] [ffmpeg_transform_bundler-3]: process has finished cleanly [pid 43401]
[INFO] [camera_driver_node-2]: process has finished cleanly [pid 43399]
[INFO] [camera_driver_node-1]: process has finished cleanly [pid 43397]

and this was the subsequent failure:

[camera_driver_node-2] terminate called after throwing an instance of 'Spinnaker::Exception'
[camera_driver_node-2]   what():  Spinnaker: Could not read the XML URL [-1008]

the very next run I got the camera not found / found cameras: unknown error

There's doesn't seem to be a pattern as to when the error will occur. Prior to the above error, I ran the node 3 times in succession without issue (maybe 30 seconds between running).

I checked the ROS log file but there's no new info.

berndpfrommer commented 8 months ago

I just tried to reproduce this problem with my USB-based Blackfly S stereo camera, so very similar setup to yours. (They are externally triggered though). I started and stopped them 20 times, worked flawlessly. It's hard to troubleshoot an error that I can't reproduce here. Can we simplify the setup, i.e. do you also see this problem when running just a single camera?

Without wanting to push back, the error you are getting about not being able to read the XML URL feels like it's a problem between camera and SDK. Do you see any similar warning messages when using SpinView?

Another thing: the cameras are finicky about the USB3 vision cables. I went through several sets of cheap Amazon USB3 vision cables and they gave errors. The only ones working for me where the ones that FLIR sells ($$$) and the ones I got from LCom.

buckleytoby commented 8 months ago

We got our USB3 vision cables second hand so maybe that's the issue. Will purchase new ones and see if it improves.

Thanks!

berndpfrommer commented 7 months ago

@buckleytoby have you been able to resolve this?

berndpfrommer commented 6 months ago

@buckleytoby I'd like to see this issue resolved one way or the other since it gives the (potentially misleading) impression that there's something wrong with the driver that prevents USB3 blackfly_s from working correctly.

berndpfrommer commented 6 months ago

Closing this issue for lack of updates. Please reopen issue if there are signs that this is indeed a software issue related to the ROS2 driver.

TillBeemelmanns commented 4 months ago

We could observe the same problem with four cameras connected to one computer. We can observe inconsistent start-ups.

If the time between shutting down the camera drivers and restarting the drivers is very short we experience randomly some of the following problems.

giving up, camera XXXXXXX not found!
param XXXXX error writing to Device XXXXXX at register address XXXXXXX. Please try to reconnecting the device. 
terminate called after throwing an instance of 'Spinnaker::Exception'
what(): Spinnaker: Camera is not communicating in U3V protocol.
terminate called after throwing an instance of 'Spinnaker::Exception'
what():  Spinnaker: Could not read the XML URL [-1008]
node XXXXX not writable. 

Usually, if we do a fresh reboot on the host machine and wait several minutes, all cameras start up correctly. But after that we observe severe problems. BTW: We make sure that the driver gets killed with SIGINT and the driver usually finishes "cleanly".

Any advice ? Is there any possibility to stop the cameras completely in a separate automate process besides the ros driver ? How does the driver make sure the SDK process shuts down the camera cleanly ?

berndpfrommer commented 4 months ago

Hmm. Maybe it is a software issue after all. AFAICT the driver is calling DeInit() on shutdown, and in fact, it may be calling it too often. But if you could please do some simple experiments for me, that would be very helpful: 1) Can you reproduce this problem with a single camera, or does it only happen when running multiple cameras? 2) The Spinnaker SDK comes with example executables (typically on install, it asks you if you want the examples as well). Can you run this one: http://softwareservices.flir.com/Spinnaker/latest/_acquisition_multiple_cameras_write_to_file_8cpp-example.html a couple of times quickly back-to-back to see if that causes the same problem? If not, then I know it's a ROS2 driver issue, and not an SDK problem. 3) Are you running your multi-camera setup as composable nodes, or as individual nodes? I think accessing the SDK from multiple programs (individual nodes) simultaneously is not supported by FLIR, and weird things happen.

TillBeemelmanns commented 4 months ago
  1. Killing and relaunching just single camera seems to be stable. The problem seems to occur then launching multiple cameras at the same time. However, some weeks ago we had a similar problem with launching and restarting a single camera: "node XXXXX not writable.". Back then we did not kill the process with SIGINT but SIGTERM. Now as we kill the process with SIGINT this problem does not occur again.
  2. I was not able to execute this test today. Maybe I will try next week again.
  3. We run driver nodes each in separate docker containers that are launched with a docker compose setup. The container contains only ros-humble, ros-humble-spinnaker-camera-driver and ros-humble-image-pipeline. Then we mount our calibrations, configs and launchfile in the container and start the driver. Hence the SDK is also accessed individually in each container separately. I guess that means there will be just a 1:1 association between a driver node and the SDK.

However, I think I found a workaround for us: It seems like the problem arises from the fact that the driver nodes are launched at almost the same time. When I start the nodes 5s apart from each other, the problem seems to be resolved. Today, I could just test it for ~10 startups. I would do further test the next days and report it here.

berndpfrommer commented 4 months ago

Accessing the Spinnaker SDK through multiple applications at the same time is not supported. Users have reported problems with USB3 cameras in this scenario, see https://github.com/berndpfrommer/flir_spinnaker_ros2/issues/24 and below FLIR's response. FLIR does not test that scenario, so there is no guarantee that it will work. I suggest building the driver from source and running everything as composable nodes. I have released what I believe to be a fix for the apt binaries with PR #177 but it may take 1-2 months until that fix has worked its way through the ROS2 build system.

Hi Bernd,

My name is William, I am a Support Engineer for Teledyne FLIR Machine Vision, and I will be assisting you with your questions.

First off, apologies for the delay; I'm not sure how the response didn't go through, but I will follow up in a couple of days in case you don't respond to this one by then. I have brought up your issue with our software engineers here. Officially, we do not support using multiple instances of spinnaker running. Running multiple cameras in a single instance of a spinnaker-based application is supported and tested for, but having multiple instances of spinnaker running (each with their own camera) does not. Although it is possible to have multiple instances of Spinnaker running with multiple cameras, and have no issues occurring, its not something we test for, or support ourselves, meaning issues could occur, regardless of using ROS2 or not in the spinnaker application.

Best regards,

William Gallego Support Engineer Teledyne FLIR

berndpfrommer commented 3 months ago

The startup issues reported here are likely related to running several instances accessing the SDK. The workaround is to use composable nodes. With PR #177 now synced to the apt packages, composable nodes can be used for running multiple cameras simultaneously.