hyperion-project / hyperion.ng

The successor to Hyperion aka Hyperion Next Generation
https://hyperion-project.org/
MIT License
3.04k stars 377 forks source link

Random error - Rs232 SerialPortError: Resource temporarily unavailable #1183

Closed newkind closed 3 years ago

newkind commented 3 years ago

Bug report

First of all I'm aware I'm using CoreElec and you most likely will tell me to go to the CE forums and ask there, but I tried and unfortunatelly folks there weren't able to help me. Here's the link to my previous topic in CE forums: https://discourse.coreelec.org/t/amlogicgrabb-stops-working-after-couple-hours-hyperion-ng/13091/11?u=newkind

So I'm using a WS2812B leds connected to the Arduino Uno clone and the issue is that randomly after random amount of time the Hyperion.ng stops working because of the Rs232 error. This Hyperion.ng addon build is based on the latest Hyperion.ng 2.0.0-alpha.9

Here's the part of the log:

2021-02-03T09:56:52.173 hyperiond HYPERION     : <DEBUG> PriorityMuxer.cpp:270:clearInput() | Removed source priority 250
2021-02-03T09:56:52.173 hyperiond HYPERION     : <DEBUG> PriorityMuxer.cpp:351:setCurrentTime() | Set visible priority to 255
2021-02-03T09:56:52.174 hyperiond HYPERION     : <DEBUG> Hyperion.cpp:559:handlePriorityChangedLedDevice() | priority[255], previousPriority[250]
2021-02-03T09:56:52.174 hyperiond HYPERION     : <DEBUG> Hyperion.cpp:562:handlePriorityChangedLedDevice() | No source left -> switch LED-Device off
2021-02-03T09:56:52.175 hyperiond COMPONENTREG : <DEBUG> ComponentRegister.cpp:36:setNewComponentState() | Framegrabber: disabled
2021-02-03T09:56:52.177 hyperiond AmLogic      : <DEBUG> GrabberWrapper.cpp:58:stop() | Grabber stop()
2021-02-04T00:37:52.156 hyperiond LEDDEVICE    : <DEBUG> ProviderRs232.cpp:95:close() | Close UART: ttyUSB0
2021-02-04T00:37:52.156 hyperiond LEDDEVICE    : <ERROR> Device disabled, device 'adalight' signals error: 'Rs232 SerialPortError: Resource temporarily unavailable'
2021-02-04T00:37:52.157 hyperiond COMPONENTREG : <DEBUG> ComponentRegister.cpp:36:setNewComponentState() | LED device: disabled

Link to the full log: https://pastebin.com/raw/ZwyptMHK

When the error happens, doing killall hyperiond and waiting couple seconds for the daemon to restart fixes the issue for the next X hours until it happens again.

This happens randomly and can happen ie. after an hour or 20 hours. There's no general rule for that. I could set the CRON to restart hyperiond every couple hours but that is certainly not a fix for this issue.

I'll gladly test all the patches or code that would fix this.

Thank you!

Steps to reproduce

None, just wait couple hours.

What is expected?

Hyperion.ng works just fine and in case of such error - restarts the daemon automatically.

What is actually happening?

Error Device disabled, device 'adalight' signals error: 'Rs232 SerialPortError: Resource temporarily unavailable is thrown and the daemon never restarts keeping the device dead until the whole box is restarted or the killall hyperiond command is used.

System

Hyperion Server:

Hyperion Server OS:

Lord-Grey commented 3 years ago

@newkind Try setting the baud rate to 115200.

An Arduino Uno cannot do 500000.

Does that work?

newkind commented 3 years ago

I tried different baudrates under 500000 but the leds were always slow and choppy - especially at 115200. The 500000 looks really nice and is very fluid and smooth.

@Lord-Grey I'll try going with 460800 - this one is also smooth and fluid. This is the one I used in the past on my different setup with Arduino Uno and I didn't had such issues there.

Lord-Grey commented 3 years ago

The older code was slower and had more delays. That is why it might have been working before. The real Android Uno can technically only do 115200. In case your Uno like model can do higher baud-rate, I would suggest you increase the latch-time. The "Temporarily Available" error is an indication that the board cannot cope with the data volume sent in the configured timeframe.

If it is slow you might want also to check, if you grabbing too quickly or with a too high resolution...

newkind commented 3 years ago

Thank you very much for the suggestions! I'll test everything again and let you know the results.

Lord-Grey commented 3 years ago

@newkind Any update?

newkind commented 3 years ago

@Lord-Grey I'm sorry but last days has been crazy (we got a new puppy) so I wasn't able to do any testing, but today I'll be doing all changes to the configuration and the arduino sketch. It might take couple hours to see if the Hyperion.ng still breaks.

Please give me couple more hours and I'll surely get back to you with an update!

Lord-Grey commented 3 years ago

@newkind No worries. Take your time! I only wanted to check, if the issues still persist....

newkind commented 3 years ago

@Lord-Grey Sorry it took so long to get back to you.

So I did play with the configuration and Arduino sketches today and my findings are:

1) I wasn't able to achieve the smooth results on baud rate 115200 - I played with the capture width and height, latch time, refresh rate 2) On baud rate 500 000 - everythings super smooth on latch time 30ms (my current configuration), but anything above that value ie. 40ms makes the animation look choppy 3) I decreased my capture width from 80px to 45px and will see if this will help the issue, but I doubt that especially that the error happens even when Hyperion is not working currently

@Lord-Grey "The "Temporarily Available" error is an indication that the board cannot cope with the data volume sent in the configured timeframe."

I agree that this could be the case, but if it would - shouldn't the error happen only when Hyperion is actually working and controlling leds actively? The issue I'm facing happens overtime and 99% during night when no one is watching anything and leds are turned off, so the device just sits idle and waits until I'll start playing anything. It just behaves like it would lose the usb power or something like that and cannot reconnect properly.

I'm also doing another test with a different Uno R3 device. I noticed that the one I used currently was not based on Atmel chip and reported on ttyUSB0. The new one has Atmel chip and connects as ttyACM0 - we'll see if this will help in any way.

newkind commented 3 years ago

Update: So changing to the Atmel powered Uno R3 didn't help. However I noticed one more thing - the Atmel version of Uno R3 allows me to set baudrate to 460800 - something that non-Atmel version didn't allow, so I'm going to try it. The non Atmel version allowed me only 115200 or 500000, 460800 didn't worked for some reason.

I'll come with more updates later.

newkind commented 3 years ago

So it looks like this didn't change anything - I even think that the device is crashing much faster when using ttyACM0.

Latest log: https://pastebin.com/raw/L4pPQPNN

This time it took only about 5 minutes in idle to prevent device from working.

Right now I'll switch the baudrate to 115200 (just for testing as it's unusable) to see if it'll still crash. I'll come back with new log.

newkind commented 3 years ago

One more update - setting baudrate to 115200 also doesn't solve it.

Here is the new log: https://paste.kodi.tv/eduxarocev.kodi

Relevant part:

2021-03-01T14:11:29.336 hyperiond HYPERION     : <DEBUG> PriorityMuxer.cpp:161:registerInput() | Register new input 'System/GRABBER' with priority 250 as inactive
2021-03-01T14:11:29.336 hyperiond COMPONENTREG : <DEBUG> ComponentRegister.cpp:36:setNewComponentState() | Framegrabber: enabled
2021-03-01T14:11:29.336 hyperiond AmLogic      : <DEBUG> GrabberWrapper.cpp:48:start() | Grabber start()
2021-03-01T14:11:29.390 hyperiond HYPERION     : <DEBUG> PriorityMuxer.cpp:251:setInputImage() | Priority 250 is now active
2021-03-01T14:11:29.390 hyperiond HYPERION     : <DEBUG> PriorityMuxer.cpp:351:setCurrentTime() | Set visible priority to 250
2021-03-01T14:11:29.390 hyperiond HYPERION     : <DEBUG> Hyperion.cpp:559:handlePriorityChangedLedDevice() | priority[250], previousPriority[255]
2021-03-01T14:11:29.390 hyperiond HYPERION     : <DEBUG> Hyperion.cpp:569:handlePriorityChangedLedDevice() | new source available -> switch LED-Device on
2021-03-01T14:11:29.399 hyperiond LEDDEVICE    : <DEBUG> ProviderRs232.cpp:95:close() | Close UART: ttyACM0
2021-03-01T14:11:29.399 hyperiond LEDDEVICE    : <ERROR> Device disabled, device 'adalight' signals error: 'Rs232 SerialPortError: Resource temporarily unavailable'
2021-03-01T14:11:29.400 hyperiond COMPONENTREG : <DEBUG> ComponentRegister.cpp:36:setNewComponentState() | LED device: disabled

So it's either my device - Mecool M8S L or there's something with Hyperion.ng as this happens on two completely different Arduino devices.

I have ordered today two Arduino Micro Leonardo to hopefully solve this once for all. I'm using it on another device on my main TV and that one works perfectly fine, so if it'll be having issues on my secondary TV, then I'll know for sure that the Mecool M8S L device is to blame.

Lord-Grey commented 3 years ago

@newkind Just to see, if something is in the system log, could you do me an extract for the 28.02.2021 and 01.03.2021, please?

journalctl --since "2021-02-28" --until "2021-02-28 21:00" journalctl --since "2021-03-01" --until "2021-03-01 17:00"

Lord-Grey commented 3 years ago

As you seem to use CoreElec, do you use the Suspend/Resume feature?

newkind commented 3 years ago

@Lord-Grey Both of CoreElec devices are constantly up/awake - I do not put any of these devices under suspend/sleep.

I'm not sure if I do something wrong but both of these commands output only ~ :

-- Logs begin at Mon 2021-03-01 19:09:13 CET, end at Mon 2021-03-01 21:53:08 CET. --

~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
Lord-Grey commented 3 years ago

@newkind Could you do a

journalctl --since "2021-02-28" --until now

or alternatively, if the error occurs

journalctl --since "2 hours ago"

newkind commented 3 years ago

@Lord-Grey Sure! Here's the log: http://ix.io/2RnY

Lord-Grey commented 3 years ago

@newkind Thank you. At minimum there seems to be some indication in the Log:

Mar 01 22:12:48 CoreELEC-M8S kernel: hub 1-0:1.0: port 1 disabled by hub (EMI?), re-enabling...
Mar 01 22:12:48 CoreELEC-M8S kernel: usb 1-1: USB disconnect, device number 2
Mar 01 22:12:48 CoreELEC-M8S kernel: usb 1-1: new full-speed USB device number 4 using xhci-hcd
Mar 01 22:12:48 CoreELEC-M8S kernel: usb 1-1: New USB device found, idVendor=2341, idProduct=0043
Mar 01 22:12:48 CoreELEC-M8S kernel: usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=220
Mar 01 22:12:48 CoreELEC-M8S kernel: usb 1-1: Manufacturer: Arduino (www.arduino.cc)
Mar 01 22:12:48 CoreELEC-M8S kernel: usb 1-1: SerialNumber: 55735323935351B002A0
Mar 01 22:12:48 CoreELEC-M8S kernel: usb 1-1: ep 0x82 - rounding interval to 1024 microframes, ep desc says 2040 microframes
Mar 01 22:12:48 CoreELEC-M8S kernel: cdc_acm 1-1:1.0: ttyACM1: USB ACM device
Mar 01 22:12:48 CoreELEC-M8S sh[2891]: 2021-03-01T22:12:48.829 hyperiond LEDDEVICE    : <ERROR> Device disabled, device 'adalight' signals error: 'Rs232 SerialPortError: Resource temporarily unavailable'
Mar 01 22:12:48 CoreELEC-M8S hyperiond[2901]: Device disabled, device 'adalight' signals error: 'Rs232 SerialPortError: Resource temporarily unavailable'
newkind commented 3 years ago

@Lord-Grey So as suspected the device turns off the usb hub internally right? If that's the case then I'll take this topic to the CE forums and ask for advice there.

Lord-Grey commented 3 years ago

@newkind In addition, I am going to check, if I just could ignore the error and do the next write or if I need to reopen the USB device.... to make the hyperion more robust for such a scenario....

newkind commented 3 years ago

That would be awesome! Thank you!

newkind commented 3 years ago

@Lord-Grey Just coming back to you with another update. I took the kernel issue to the CoreElec forums and vpeter checked that this kernel code has such description:

EM interference sometimes causes badly shielded USB devices to be shutdown by
the hub, this hack enables them again. Works at least with mouse driver.

My arduino is in a separate plastic case about 30cm from the CE device so I ruled that out as a source of interference.

After that I replaced the usb cable that I have been using and I haven't noticed any crashes since then (almost 24h)! Even though the cable I was using works, it's most likely so badly shielded that it caused the EM interference and thus linux kernel was disabling the usb port on the internal hub.

That was... well... unexpected!

I still need a day or two to make sure that the crash won't happen again, but so far it looks really good!

One more time thank you for your support and finding out the kernel error report! :)

newkind commented 3 years ago

Everything still works fine, so I can confirm that the issue was the poor cable shielding causing interference and linux kernel disabling the usb internal hub port because of the EM interference.

One more time thank you @Lord-Grey for your support.