Closed efernandez closed 7 years ago
FYI, if I unplug the camera while the driver is running and I plug it again, it recovers gracefully with this PR. This is the output it'll show on the terminal:
[ERROR] [1472677524.876930403]: PointGreyCamera::getCameraTemperature Could not get property. | FlyCapture2::ErrorType 28Feature register read failed.
[ERROR] [1472677524.878818351]: Failed to disconnect with error: PointGreyCamera::disconnect Failed to disconnect camera | FlyCapture2::ErrorType 35Error stopping stream.
[ INFO] [1472677525.879065059]: Disconnected from camera.
[ERROR] [1472677528.121071034]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677529.121485640]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677530.121742309]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677531.122088810]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677532.122483199]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677533.122902717]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677534.123325582]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677535.123758305]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677536.124192551]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677537.124654832]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677538.125079421]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677539.125474241]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677540.125748367]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677541.126126094]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677542.126536094]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677543.126926990]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ERROR] [1472677544.127358681]: Failed to connect with error: PointGreyCamera::connect Failed to get first connected camera | FlyCapture2::ErrorType 19Failed to get the camera from the index.
[ INFO] [1472677548.174125225]: Connected to camera.
[ INFO] [1472677548.222552638]: Started camera.
Note this was run exactly with this PR, ie. w/o the hack to force an error when reading the TEMPERATURE
.
Looks good, thanks for the thorough testing.
@mikepurvis I've updated the PR:
connectCb()
simply starts the devicePoll()
loop in DISCONNECTED
state. This has the benefit that we have error handling and recovery if something wrong happens in the callback. Before this PR, any error there blocks the callback.devicePoll()
: ERROR
-> [ STOPPED
-> ] DISCONNECTED
-> CONNECTED
-> STARTED
. This implements the different transitions and still allows the thread running devicePoll()
to get interrupted. The diagnostics are also updated on every iteration.I've unplug and plug as fast as I could, even during a recovery sequence, and the driver always comes back; note that it's not very easy to do this fast because the USB connector plugs in quite tight.
I've also try to start the driver before plugging in the camera. This still can leave the camera and the nodelet in a bad state. It needs the whole nodelet manager to be restarted. However, this is out of the scope of this PR and it's a very uncommon case.
If you're OK with the new implementation, please give me a :+1: . Thanks for your comments.
LGTM 👍
Does the fix for this issue include cases like:
I ran 0.12.2 tag and then disconnected the camera while looking at the image_color topic in RVIZ. I was expecting the driver to reconnect and start publishing the image_color topic again but it didn't. I do see from the camera_nodelet_manager an Info message "Connected to camera." sent every second or so to the ROS console.
@Roboticom314 I tested the "temporary power loss to the camera but ROS driver still running" use case and it worked for me. Do you mind opening an issue with more details?
It'd be difficult to reproduce if we don't have the same camera, but at least we could try with the same software, or similar.
@efernandez Sorry for taking so long to reply.
If you need more info, let me know.
I haven't tried with a Bumblebee2, and I'm afraid I don't have any available to me ATM. Could you please create an issue a provide those details and also the steps to reproduce it?
This is a merged PR, so it's not really the place to discuss this. Simply reference it from the issue. :smiley:
Hopefully someone else can try with a Bumblebee2. Or I can figure out what's wrong from the terminal output, but from your comment it could be anything on the FlyCapture reconnection. Maybe try with newer versions of FlyCapture.
This implements the recovery strategy that was commented out, when an exception is thrown.
The strategy does:
disconnect
+connect
+start
. From my tests, there's no need to callstop
beforedisconnect
.If any of the three steps fail, it's tried again after one second. From my tests, none of them fails.
I need this because after many hours running the camera, if fails to grab images or query the temperature property in the
devicePoll
loop: https://github.com/ros-drivers/pointgrey_camera_driver/blob/master/pointgrey_camera_driver/src/nodelet.cpp#L315Since it's very difficult to reproduce, I've forced an error to happen every 10 iterations, when the
TEMPERATURE
property is read, with this code diff:After applying that change, we can reproduce/force an error to happen and see that the camera driver recovers gracefully, running this in different terminals:
The output would be like this:
:exclamation: If we don't handle the errors, as in the original code: https://github.com/ros-drivers/pointgrey_camera_driver/blob/master/pointgrey_camera_driver/src/nodelet.cpp#L371 the loop iterates very fast spamming the logs, with catastrophic consequences for the system (out of memory issues because of flooding
/rosout
mainly). It's important to note that the loop doesn't have any blocking functions other thangrabImage
, so when it fails, it returns immediately and we can have up to1000
log errors per second.@mikepurvis