groupgets / purethermal1-firmware

Reference firmware for PureThermal 1 FLIR Lepton Dev Kit
MIT License
125 stars 62 forks source link

PureThermal stops sending frames. #23

Open Gronis opened 5 years ago

Gronis commented 5 years ago

I've been using the purethermal-mini camera on a raspberry pi 3b+ to grab a IR video stream. At first it works perfectly, but after a while, the camera stops streaming frames.

If I look at /var/log/syslog, the log is full of the following error:

Transfer to device 6 endpoint 0x1 frame 2025 failed - FIQ timed out. Data may have been lost.
Transfer to device 6 endpoint 0x1 frame 57 failed - FIQ timed out. Data may have been lost.
Transfer to device 6 endpoint 0x1 frame 61 failed - FIQ timed out. Data may have been lost.
Transfer to device 6 endpoint 0x1 frame 65 failed - FIQ timed out. Data may have been lost.
Transfer to device 6 endpoint 0x1 frame 69 failed - FIQ timed out. Data may have been lost.

I have found a stack overflow post here with the exact same problem. According to that post, the problem is only found when the camera is used with a raspberry PI. Also reloading the uvcvideo kernal module (or reconnect the device physically) makes the camera useable again for a short while.

Any ideas? I really need to use this camera together with a raspberry pi.

kekiefer commented 5 years ago

I would love for this to be an issue with the firmware, then we could fix it here. So far I haven't seen any evidence it's anything but a Raspberry Pi USB driver issue. Anecdotally, I've heard many reports of this issue, but only ever from Raspberry Pi users.

The PT firmware sends frames in Isochronous mode, which means that for a given frame format, it asks for a certain amount of dedicated USB bandwidth to transfer video data, and streams all that data on a fixed timebase with no backpressure, nacking, or retry requesting possible from the host.

So if frames are getting dropped from an isochronous stream, it sounds like problem with the host not supplying the necessary bandwidth. Which should be fine since streams are stateless, but only if the host drivers can handle the dropped data, which it looks like they can't.

mandymon commented 5 years ago

I've also experienced this same problem and have been testing other platforms and OSs to determine the cause. Any help would be greatly appreciated as after significant testing across 3 devices, 3 operating systems and two methods of grabbing the video stream this issue remains. Below are my test conditions:

RPi 3b + Raspbian Stretch 19-04-08 Lepton 3 + Pure Thermal 2 (Didn't get the firmware version) Capturing images and video through ffmpeg with a systemd service and timer

RPi 3b + Raspbian Buster 19-06-20 Lepton 3.5 + Pure Thermal Mini (v1.2.2) Capturing images and video through ffmpeg with a systemd service and timer

These two issues occur when running either of the RPi combinations. Primary Issue: The same as what OP stated, the lepton works perfectly for 1-2 hours then freezes with that same error in the logs.

Secondary Issue: If I ssh into the pi via ethernet the connection to the lepton freezes and I am greeted with "error: [Errno 32] Broken pipe". This issue appears to be due to the RPi 3b running ethernet through the USB controller and was not experienced on any other device. Interestingly, increasing the USB memory buffer increased the amount of time before the error occurred.

After reading this thread initially, I decided to test with a Nvidia Jetson Nano:

Nvidia Jetson Nano + Jetson Nano SD Image (Latest as of this post) Lepton 3 and Lepton 3.5 + Pure Thermal Mini (v1.2.2) (Two different Minis as well) Capturing images and video through ffmpeg with a systemd service and timer and by capture through opencv (a modified version of the provided opencv-capture.py file)

Like the RPi after 1-2 hours the lepton just hangs and the OS puts the USB port to power saving mode. I forced the USB port to stay on but it did nothing as it appears to turn off after the Lepton has frozen, not as the cause of it. Oddly, there was no error in any system log. When using ffmpeg and a systemd service it eventually just gets stuck trying to open the stream and a restart does not fix this, only physically unplugging the lepton. When using an opencv python file it eventually can't open the camera and terminates, again with no error in the system log.

Finally, thinking that this may be an ARM related issue I tested the same methods as above on an i7 Ubuntu machine and got the exact same results.

With all the tests it appears that frequency plays a part in how long it takes to stop. For instance in tests where a one minute video was captured every 15 minutes, it took about 2 hours to stop. But if this was decreased to every 5 minutes, the lepton stopped after an hour.

It looks to me like its an issue with the lepton firmware as restarting the OS does not fix it, only physically unplugging the lepton board or powering cycling the device. Any suggestions?

kekiefer commented 5 years ago

Do you have the service to share that will lock it up most reliably? I'll hook up JTAG and see if I can trap this condition.

Just to be clear, does this only happen when you're stopping and starting the stream on the tegra or i7, or will running a continuous stream on these platforms also kill it after a while?

mandymon commented 5 years ago

lepton_issue.zip

Heres a zip file with both the opencv version which runs a continuous stream and the systemd/ffmpeg version that stops and starts the stream. Running the start/stop method caused the issue on both the tegra and i7, but I didn't test the opencv version on the i7, only the tegra. I'd probably suggest starting with the open cv version as it occurs quicker than the systemd version (40ish mins vs 1 hour ish) to save you waiting around for it.

kekiefer commented 5 years ago

The opencv code you sent does not run a continuous stream. It stops and restarts the stream as it switches between capturing images and video.

At any rate, I've been running your opencv version for 4 hours now on a Jetson TX2, and no issues with it yet, but this is with the JTAG hooked up and some extra debugging enabled. I'll keep trying with the 1.2.2 release code and different boards in hopes that I can reproduce the issue.

mandymon commented 5 years ago

Ah true, my bad.

Thats interesting, I'll also keep looking into it and see if i can figure it out. Thanks for your help

Gronis commented 5 years ago

So, the conclusion is that it works on Nvidia Jetson Nano (for kekiefer), but not on the Raspberry Pi? @mandymon Did you try the openCV version on the RPi 3b?

When I performed my initial tests for this issue, I used ubuntu 16.04. Later, when I tested with raspbian, I had a stream rolling for at least 2h without the issue, so I figured that maybe the more up-to-date kernel version and usb drivers helped out, though I'm not certain at all that this is the case. I'm not saying that it worked, but it seemed to be more stable at the least. However, from @mandymon's experience, it seems that the problem still exists on the latest versions of Raspbian.

Another question, why is it that the opencv code "does not run a continous stream"? All I can see, is that a frame is fetched with ret, img = self.cv2_cap.read() both when saving video and images. Why would this stop and restart the stream?

mandymon commented 5 years ago

kekiefer tested it on a Nvidia Jetson TX2 and said it worked, I've tested it on a Nvidia Jetson Nano and had an issue. I'll run some tests with a 3b+ and see how it goes.

Thats interesting that it worked, was that on Ubuntu mate on a pi, or Ubuntu on an x86 machine?

Thats what I thought when I modified the code and hence why I initially said it was continuous, but after kekiefer mentioned it I looked at it again. I haven't done much with classes in python (hence why I was wrong) and it looks like it's creating and opening a new instance of the camera object each time its called. I could be wrong again though, haha.

Gronis commented 5 years ago

Thats interesting that it worked, was that on Ubuntu mate on a pi, or Ubuntu on an x86 machine?

Ubuntu Mate on RPi 3b Thats what I thought when I modified the code and hence why I initially said it was continuous, but after kekiefer mentioned it I looked at it again. I haven't done much with classes in python (hence why I was wrong) and it looks like it's creating and opening a new instance of the camera object each time its called.

Ah, I missed that part. Didn't notice you recreated the object each loop cycle. Never look at code before going to bed 😴

kekiefer commented 5 years ago

I have found one thing. Is this consistent with the failure you are seeing? If the Lepton takes too long to sync up when the stream is starting, sometimes opencv will give up on it with a "select error" reported. However, restarting the program then works without fail, and nothing is lunched-out.

mandymon commented 5 years ago

It's almost the same. I have found that restarting it fixes it but not always on the first attempt, normally I have to spam it before it starts working again. I did run a test for 3 days using a continuous version (actually this time) and it's still going strong which is good. So it looks like the issue is something about opening connection. For what I need, it would be better to use a stop start method to save power but it would need to be reliable.

hqm commented 4 years ago

I am having the same issue, with a Jetson TX2 board: I have a Pure Thermal board from Sparkfun https://www.sparkfun.com/products/14670 , and I plug it into a USB port on the Jetson TX2.

I then run VLC to connect to the camera and view the video on an HDMI monitor. The camera board runs for a while , sometimes several hours, but eventually freezes. The /dev/video2 port is then unreadable until I quit VLC and power cycle the camera by unplugging it from USB and plugging it back in again. At that point I can restart VLC and reconnect to the Pure Thermal board and get video again.

Also, while I've got your attention , is there any way to set the camera to set the exposure so it doesn't do auto-exposure? I'm trying to take some measurements from a baseline reading, and would love to be able to pin the exposure level at some point in time.

mrspirytus commented 3 years ago

I am also experiencing the same issue with two PureThermal (fw:v1.3.0) (1e4e:0100) cameras running on NVIDIA Namo with the latest T4L 18.04. I am also using the latest libuvc build from the source. The reason I am building libuvc from the source is that apt-get version has an issue with shutdown (libuvc hangs on close thread/join)

Anyway, Last night I did set up extra logging in my application. Today I did look at my app logs and noticed that libuvc stopped firing. libusb utility shows the camera connected. The next thing I looked at was dmesg around the time then libuvc stop callback firing. Total time from start till issue manifested itself was about around 10hr. But a day before it was about < 1hr

Here is the dmesg output. The last update my application received from libuvc was at 05/08/2021 04:11:34

[Fri May  7 19:52:23 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 6 was not initialized!
[Fri May  7 19:52:23 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 7 was not initialized!
[Fri May  7 19:52:23 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 21 was not initialized!
[Fri May  7 19:52:23 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 254 was not initialized!
[Fri May  7 19:52:23 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Camera 1 was not initialized!
[Fri May  7 19:52:23 2021] input: PureThermal (fw:v1.3.0) as /devices/70090000.xusb/usb1/1-3/1-3.1/1-3.1.3/1-3.1.3.3/1-3.1.3.3:1.0/input/input14
[Fri May  7 19:52:23 2021] usb 1-3.1.3.3: usbfs: process 30605 (axonTest) did not claim interface 0 before use
[Fri May  7 19:52:25 2021] usb 1-3.1.3.3: usb_suspend_both: status 0
[Fri May  7 19:52:25 2021] usb 1-3.1.3: usb_suspend_both: status 0
[Fri May  7 19:52:25 2021] usb 1-3.1: usb_suspend_both: status 0
[Sat May  8 04:11:26 2021] usb 1-3-port1: disabled by hub (EMI?), re-enabling...
[Sat May  8 04:11:26 2021] usb 1-3.1: USB disconnect, device number 42
[Sat May  8 04:11:26 2021] usb 1-3.1.3: USB disconnect, device number 43
[Sat May  8 04:11:26 2021] usb 1-3.1.3.3: USB disconnect, device number 44
[Sat May  8 04:11:26 2021] usb 1-3.1: new high-speed USB device number 45 using tegra-xusb
[Sat May  8 04:11:26 2021] usb 1-3.1: New USB device found, idVendor=214b, idProduct=7250
[Sat May  8 04:11:26 2021] usb 1-3.1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[Sat May  8 04:11:26 2021] usb 1-3.1: Product: USB2.0 HUB
[Sat May  8 04:11:26 2021] hub 1-3.1:1.0: USB hub found
[Sat May  8 04:11:26 2021] hub 1-3.1:1.0: 4 ports detected
[Sat May  8 04:11:26 2021] usb 1-3.1.3: new high-speed USB device number 46 using tegra-xusb
[Sat May  8 04:11:26 2021] usb 1-3.1.3: New USB device found, idVendor=214b, idProduct=7250
[Sat May  8 04:11:26 2021] usb 1-3.1.3: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[Sat May  8 04:11:26 2021] usb 1-3.1.3: Product: USB2.0 HUB
[Sat May  8 04:11:26 2021] hub 1-3.1.3:1.0: USB hub found
[Sat May  8 04:11:26 2021] hub 1-3.1.3:1.0: 4 ports detected
[Sat May  8 04:11:27 2021] usb 1-3.1.3.3: new full-speed USB device number 47 using tegra-xusb
[Sat May  8 04:11:27 2021] usb 1-3.1.3.3: New USB device found, idVendor=1e4e, idProduct=0100
[Sat May  8 04:11:27 2021] usb 1-3.1.3.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[Sat May  8 04:11:27 2021] usb 1-3.1.3.3: Product: PureThermal (fw:v1.3.0)
[Sat May  8 04:11:27 2021] usb 1-3.1.3.3: Manufacturer: GroupGets
[Sat May  8 04:11:27 2021] usb 1-3.1.3.3: SerialNumber: 800e002b-5110-3039-3433-373300000000
[Sat May  8 04:11:27 2021] uvcvideo: Found UVC 1.00 device PureThermal (fw:v1.3.0) (1e4e:0100)
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 3 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Processing 2 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 4 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 5 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 6 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 7 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 21 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Extension 254 was not initialized!
[Sat May  8 04:11:27 2021] uvcvideo 1-3.1.3.3:1.0: Entity type for entity Camera 1 was not initialized!
[Sat May  8 04:11:27 2021] input: PureThermal (fw:v1.3.0) as /devices/70090000.xusb/usb1/1-3/1-3.1/1-3.1.3/1-3.1.3.3/1-3.1.3.3:1.0/input/input15
[Sat May  8 04:11:29 2021] usb 1-3.1.3.3: usb_suspend_both: status 0
[Sat May  8 04:11:29 2021] usb 1-3.1.3: usb_suspend_both: status 0
[Sat May  8 04:11:29 2021] usb 1-3.1: usb_suspend_both: status 0
lmoesch commented 8 months ago

Same issue on Raspberry Pi 4 running Ubuntu 64bit using the v4l2 interface.