Open SmarTripDood opened 3 years ago
I just recently made 2 boards (thanks so much!) and noticed I also have this problem. Simply unplugging and plugging back in seems to fix the problem, but is not ideal.
I don't have a board yet, but have been following as I want to get one eventually but $. @erikrrodriguez maybe has some ideas. They have a fork of this repo with a couple changes to allow multiple stations and a walking distance modifier to ignore trains you won't make in time. Their latest commit says they were attempting to fix the board reset issues.
Unfortunately I have not successfully solved this, and haven't been able to pin down why it happens. I have to reset my board every 2 or 3 days. But if I ever figure it out I'll be sure to push the code to my repo!
Is it possible there is something with WMATA's data feed that causes this? I don't think that's it but wanted to check.
Perhaps, but I don't think so. When I've tested on my PC using Python's requests library, the program has run for days with no issues. So I believe it's to do with the board's internal requests library, perhaps getting overloaded.
I've also tried to query MetroHero's API instead of WMATA's. But the can't even get a response using the board's requests library (PC testing again works fine)
My solution to this problem: I plug it into a smart plug and have it turn OFF at 59 minutes into the hour and back ON on the hour. Not necessarily on an hourly basis, put periodically throughout the day. That solves the issue for me.
I am running into this issue also. I don't know how to troubleshoot it. The display gets stuck and stops updating at some point. I've tried adding print statements to help troubleshoot. With serial console open, it stops sending out messages too. If anyone knows better ways to troubleshoot, please let me know.
I am running into this issue also. I don't know how to troubleshoot it. The display gets stuck and stops updating at some point. I've tried adding print statements to help troubleshoot. With serial console open, it stops sending out messages too. If anyone knows better ways to troubleshoot, please let me know.
See the solution above with the smart plug. It's imperfect but it works. Have the thing switch off and on every hour (which is far more often than it freezes) and it'll keep auto restarting and the problem goes away.
Thanks, that is a nice workaround. I am hoping to find a programming fix of the root cause though, assuming it's possible and the firmware or other hardware problem isn't the issue.
My hunch is a memory leak. The portal has very little memory and the adafruit libraries have become notoriously heavy. I tried to load their version of the datetime library and it immediately crashed the board in a similar fashion.
@dylanjtastet I thought so too. But I am monitoring memory with gc
module and it doesn't show a loss in memory. Would the leak be detectable any other way?
I'm using a smart plug, but I really think the only permanent solution is to set this up on different hardware. There are a number of similar boards out there that do the same thing, using Raspberry Pi.
I tried various cords and plugs just for kicks, and mine does the same thing--just craps out usually within an hour, but sometimes it lasts longer. I had it connected to my computer to see the console, and this is what I get:
Retrieving data...Received response from WMATA api...
Reply received.
Successfully updated.
Refreshing train information...
Retrieving data...Traceback (most recent call last):
File "code.py", line 25, in <module>
File "train_board.py", line 41, in refresh
File "code.py", line 22, in <lambda>
File "code.py", line 15, in refresh_trains
File "metro_api.py", line 17, in fetch_train_predictions
File "metro_api.py", line 23, in _fetch_train_predictions
File "adafruit_portalbase/network.py", line 518, in fetch
File "adafruit_requests.py", line 823, in get
File "adafruit_requests.py", line 679, in request
File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 138, in recv
File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 210, in available
File "adafruit_esp32spi/adafruit_esp32spi.py", line 776, in socket_available
File "adafruit_esp32spi/adafruit_esp32spi.py", line 332, in _send_command_get_response
File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _wait_response_cmd
File "adafruit_esp32spi/adafruit_esp32spi.py", line 278, in _wait_spi_char
TimeoutError: Timed out waiting for SPI char
Code done running.
So I think it is an issue in the library. Unfortunately, that's beyond my knowledge, but perhaps someone who has a deeper understanding can figure out a solution (esp. one that doesn't involve actually updating the library). I am thinking it should be possible to catch the error and then have the thing restart itself, if nothing else? But I wasn't able to do that. It doesn't seem to be able to recover from the error gracefully. If anyone can figure it out, let me know!
I'm going to try using @erikrrodriguez's fork. Digging through the network library, it looks like there's a bit of redundancy in using the adafruit_request
library for http as adafruit_esp32spi_wifimanager.ESPSPI_WiFiManager
already provides this api. It also uses the request library, which is a global singleton, so the network library is creating redundant sockets and re-initializing the request library.
This is where I stopped digging, but my hunch now is that all of this is crashing the wifi coprocessor. Erik's code will also reset the coprocessor if the board is failing requests which should keep things turning if all else fails.
Thanks @dylanjtastet I was going to ping and also recommend @ScottKekoaShay try my fork.
I will admit that my board sometimes also freezes, and I haven't been able to discover why. It seems like it hangs after the request is made and ignores the timeout in waiting for a response. So I think it is ultimately still and issue in the adafruit_request
library.
But, my board was been running the past 4 days without me needing to manually reset it 🎉
Thanks i will give it a try as well!
I tried various cords and plugs just for kicks, and mine does the same thing--just craps out usually within an hour, but sometimes it lasts longer. I had it connected to my computer to see the console, and this is what I get:
Retrieving data...Received response from WMATA api... Reply received. Successfully updated. Refreshing train information... Retrieving data...Traceback (most recent call last): File "code.py", line 25, in <module> File "train_board.py", line 41, in refresh File "code.py", line 22, in <lambda> File "code.py", line 15, in refresh_trains File "metro_api.py", line 17, in fetch_train_predictions File "metro_api.py", line 23, in _fetch_train_predictions File "adafruit_portalbase/network.py", line 518, in fetch File "adafruit_requests.py", line 823, in get File "adafruit_requests.py", line 679, in request File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 138, in recv File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 210, in available File "adafruit_esp32spi/adafruit_esp32spi.py", line 776, in socket_available File "adafruit_esp32spi/adafruit_esp32spi.py", line 332, in _send_command_get_response File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _wait_response_cmd File "adafruit_esp32spi/adafruit_esp32spi.py", line 278, in _wait_spi_char TimeoutError: Timed out waiting for SPI char Code done running.
So I think it is an issue in the library. Unfortunately, that's beyond my knowledge, but perhaps someone who has a deeper understanding can figure out a solution (esp. one that doesn't involve actually updating the library). I am thinking it should be possible to catch the error and then have the thing restart itself, if nothing else? But I wasn't able to do that. It doesn't seem to be able to recover from the error gracefully. If anyone can figure it out, let me know!
I forgot to update this when I resolved my issue. For me on the Matrix Portal M4 with the help of Dan Halbert, one of the core develoers of CircuitPython, the root issue was the firmware for Circuit Python 7.3 which had a buggy DMA feature that caused the SPI failures of all kinds. He fixed it and it should be pushed to later versions, so if your board has old firmware, try flashing a new version.
Makes sense, I'm currently using Circuit Python 8 as of the other week. Thanks for getting in touch with Dan, I'm glad he could push a fix!
Edit: Something else I did was update the ESP firmware separate from Circuit Python using this guide: https://learn.adafruit.com/upgrading-esp32-firmware/upgrade-all-in-one-esp32-airlift-firmware
Yep makes sense, I was getting the same issue after switching to @erikrrodriguez's fork. Will try updating firmware now.
Upgrading firmware to 8 and the latest files from @erikrrodriguez's fork solved it for me -- many thanks!
Updating the ESP firmware did the trick for me--mine's been running several days without crapping out. Thanks for the tips @erikrrodriguez !
First, thank you for this cool project. It works great except I find it freezes every few hours (both the sign and the LED on the Matrix Portal on the back) and I have to reset it. Has this problem occurred for others and is there a fix?