Open mikerenfro opened 5 years ago
Had one hang inside adafruit_esp32spi/adafruit_esp32spi.py", line 184, in _send_command
that was finally interrupted by Ctrl-C.
File "code.py", line 90, in <module>
File "code.py", line 86, in <module>
File "adafruit_pyportal.py", line 740, in fetch
File "adafruit_pyportal.py", line 737, in fetch
File "adafruit_pyportal.py", line 733, in fetch
File "adafruit_pyportal.py", line 730, in fetch
File "adafruit_pyportal.py", line 582, in wget
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request
File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 90, in readline
File "adafruit_esp32spi/adafruit_esp32spi.py", line 545, in socket_available
File "adafruit_esp32spi/adafruit_esp32spi.py", line 298, in _send_command_get_response
File "adafruit_esp32spi/adafruit_esp32spi.py", line 184, in _send_command
KeyboardInterrupt:
PyPortal had been up for over 18612 seconds (a bit over 5 hours) at the time of the hang, with 10 second refreshes on the status text, and 120 second refreshes on the image. Had already added garbage collection calls after each status refresh, which had no effect on this hang.
Hang occurred while communicating with Adafruit.io image conversion service. Did not see evidence of any response in the debugging logs. Normally, I get {'access-control-max-age': '1728000', ... 'x-permitted-cross-domain-policies': 'none'}
quickly after making the request. None of that showed when the hang occurred.
I assume that there would be a socket timeout due to line 160 of adafruit_esp32spi_requests.py
, but that was never triggered.
Had a second issue where the adafruit.io service correctly calculates the BMP size, but the fetch()
call doesn't receive all the data:
Fetching stream from https://io.adafruit.com/api/v2/...
{'access-control-max-age': '1728000', ... 'content-length': '72850', ...}
Saving data to /sd/cache.bmp
Read 512 bytes, 72338 remaining
Read 1024 bytes, 71826 remaining
...
Read 40960 bytes, 31890 remaining
Read 41472 bytes, 31378 remaining
Read 41559 bytes, 31291 remaining
Created file of 41559 bytes in 5.9 seconds
Set background to /sd/cache.bmp
This one occurred with after roughly 7000 seconds of uptime. The previous fetch()
call was 128 seconds earlier, and worked fine.
ok so the second thing is fixed (well, at least it gives you an error)
for the first thing, i don't see anything in _send_command
that could hang :/ do you mind tryign with debug=True
or debug=2
passed into pyportal
obj creation?
Further requests that failed:
Further requests that succeeded:
Haven't seen a complete hangup of ESP32 today.
Around 2019-04-13 19:19:07.745 -0500, did see no response from Adafruit.io, but got an 'ESP32 not responding' with a timeout instead of an infinite hang.
The exception raised by PyPortal was handled correctly, and after a total of 7 'ESP32 not responding' messages (both for Adafruit.io and my own server), things went back to normal.
Around 2019-04-13 21:33:48.848 -0500, got no response from Adafruit.io, no timeouts, and an infinite hang (up until 2019-04-14 08:11 -0500):
Traceback (most recent call last):
File "code.py", line 103, in <module>
File "code.py", line 97, in <module>
File "/lib/adafruit_pyportal.py", line 745, in fetch
File "/lib/adafruit_pyportal.py", line 742, in fetch
File "/lib/adafruit_pyportal.py", line 738, in fetch
File "/lib/adafruit_pyportal.py", line 732, in fetch
File "/lib/adafruit_pyportal.py", line 582, in wget
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 235, in get
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 221, in request
File "adafruit_esp32spi/adafruit_esp32spi_requests.py", line 194, in request
File "adafruit_esp32spi/adafruit_esp32spi_socket.py", line 90, in readline
File "adafruit_esp32spi/adafruit_esp32spi.py", line 545, in socket_available
File "adafruit_esp32spi/adafruit_esp32spi.py", line 299, in _send_command_get_response
File "adafruit_esp32spi/adafruit_esp32spi.py", line 288, in _wait_response_cmd
File "adafruit_esp32spi/adafruit_esp32spi.py", line 265, in _wait_response_cmd
ReloadException:
soft reboot
Looks like the Adafruit.io service did make a request of my web server, and got a 200 response and the entire image, so whatever failed had to have been after the image download, and before the converted image would have been returned.
52.91.214.175 - - [13/Apr/2019:21:33:56 -0500] "GET /path/to/load.png HTTP/1.1" 200 10661 "-" "Ruby"
As of 2019-04-14 09:49:22.174 -0500, moved my original image from a work web server to an offsite VPS. Will leave the stress test running the rest of the weekend to see if hangs are limited to the work web server or any security hardware at the campus perimeter.
Still having intermittent hangs that appear to be independent of the backend server, and regardless of if adafruit.io is involved.
With PyPortal debug=True
, working:
Fetching stream from https://host/path/file.bmp
{'server': 'nginx/1.10.3 (Ubuntu)', 'content-type': 'image/x-ms-bmp', 'date': 'Tue, 30 Apr 2019 19:05:34 GMT', 'last-modified': 'Tue, 30 Apr 2019 19:00:02 GMT', 'connection': 'close', 'content-length': '108962', 'etag': '"5cc89b32-1a9a2"','accept-ranges': 'bytes'}
Saving data to /sd/cache.bmp
With PyPortal debug=True
, failing:
Fetching stream from https://host/path/file.bmp
(nothing)
So response headers never get printed. Implies failure in adafruit_pyportal.py, line 587:
r = requests.get(url, stream=True)
requests.get()
calls request("GET", url, **kw)
.
In adafruit_esp32_requests.py
, request()
function, added four print statements:
sock.settimeout(1)
line = sock.readline()
while True:
while
loopWaiting to see which of these print on a hang.
dya know what is the lowest level line it hangs at?
Not yet. Haven't seen it hang after adding the debugging lines. I end up leaving it plugged in on one or another Mac at home or at work and leaving screen -L /dev/tty.usbmodem*
running for extended periods instead of another serial monitor.
11.3 hours of uninterrupted use at home, going to try again at work.
4.5 hours uninterrupted at work (missed carrying it in yesterday morning), >14 hours uninterrupted at home. Either the bug was external to the PyPortal, unexpectedly fixed in a dependency, or doesn't show up on the original source .py
files instead of the .mpy
versions.
Finally got a hang about an hour ago. Got the Data written to socket
message, but not the Reading lines from socket
one. Relevant lines:
print("Data written to socket")
line = sock.readline()
#print(line)
line = line.split(None, 2)
status = int(line[1])
reason = ""
if len(line) > 2:
reason = line[2].rstrip()
print("Reading lines from socket")
After adding additional debugging print statements, finally got another hang last night:
Failing logs:
Fetching stream from https://host/path/file.bmp
Timeout set
Data written to socket
Socket readline
From adafruit_esp32spi_requests.py
:
print("Data written to socket")
line = sock.readline()
#print(line)
line = line.split(None, 2)
status = int(line[1])
reason = ""
if len(line) > 2:
reason = line[2].rstrip()
print("Reading lines from socket")
From adafruit_esp32spi_socket.py
:
def readline(self):
"""Attempt to return as many bytes as we can up to but not including '\r\n'"""
print("Socket readline")
while b'\r\n' not in self._buffer:
# there's no line already in there, read some more
avail = min(_the_interface.socket_available(self._socknum), MAX_PACKET)
if avail:
self._buffer += _the_interface.socket_read(self._socknum, avail)
firstline, self._buffer = self._buffer.split(b'\r\n', 1)
gc.collect()
return firstline
Only lines possibly causing an infinite loop/hang are the while loop in readline
.
b'\r\n'
in the buffer?socket_read
, wherever that's defined?adafruit_esp32_socket.py:read()
.No further hangs yet, but I've replaced my ad hoc debugging statements with passing debug=2
to the ESP_SPIcontrol
creation in adafruit_pyportal.py
. Hopefully that'll make it easier for others.
Got my hang a bit ago. With all the ESP debugging enabled, a working request:
Fetching stream from https://REDACTED/images/load.bmp
*** Get host by name
Sending param #0 is 16 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Parameter #0 length is 4
Read 4: [bytearray(b'DB\xcdx')]
*** Get socket
Parameter #0 length is 1
Read 1: [bytearray(b'\x00')]
Allocated socket #0
Timeout set
*** Socket connect mode 2
*** Open socket
Sending param #0 is 16 bytes long
Sending param #1 is 4 bytes long
Sending param #2 is 2 bytes long
Sending param #3 is 1 bytes long
Sending param #4 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x04')]
Writing: b'GET /images/load.bmp HTTP/1.0\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 31 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x1f\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Writing: b'Host: REDACTED\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 24 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x18\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Writing: b'User-Agent: Adafruit CircuitPython\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 36 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'$\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Writing: b'\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 2 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x02\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Sending param #0 is 1 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x00\x00')]
ESPSocket: 0 bytes available
Sending param #0 is 1 bytes long
Parameter #0 length is 2
...
ESPSocket: 0 bytes available
Sending param #0 is 1 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x00@')]
ESPSocket: 16384 bytes available
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x04')]
Reading 4000 bytes from ESP socket with status 4
Sending param #0 is 1 bytes long
Sending param #1 is 2 bytes long
Parameter #0 length is 4000
Read 4000: [bytearray(b'HTTP/1.1 200 OK\r\nServer: nginx/1.10.3 (Ubuntu)\r\nDate: Mon, 06 May 2019 13:32:28 GMT\r\nContent-Type: image/x-ms-bmp\r\nContent-Length: 108962\r\nLast-Modified: Mon, 06 May 2019 13:00:01 GMT\r\nConnection: close\r\nETag: "5cd02fd1-1a9a2"\r\nAccept-Ranges: bytes...\xf3\xf3')]
...
And a failing request:
Fetching stream from https://REDACTED/images/load.bmp
*** Get host by name
Sending param #0 is 16 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Parameter #0 length is 4
Read 4: [bytearray(b'DB\xcdx')]
*** Get socket
Parameter #0 length is 1
Read 1: [bytearray(b'\x00')]
Allocated socket #0
Timeout set
*** Socket connect mode 2
*** Open socket
Sending param #0 is 16 bytes long
Sending param #1 is 4 bytes long
Sending param #2 is 2 bytes long
Sending param #3 is 1 bytes long
Sending param #4 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x04')]
Writing: b'GET /images/load.bmp HTTP/1.0\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 31 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x1f\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Writing: b'Host: REDACTED\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 24 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x18\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Writing: b'User-Agent: Adafruit CircuitPython\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 36 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'$\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Writing: b'\r\n'
Sending param #0 is 1 bytes long
Sending param #1 is 2 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x02\x00')]
Sending param #0 is 1 bytes long
Parameter #0 length is 1
Read 1: [bytearray(b'\x01')]
Sending param #0 is 1 bytes long
Parameter #0 length is 2
Read 2: [bytearray(b'\x00\x00')]
ESPSocket: 0 bytes available
Sending param #0 is 1 bytes long
Parameter #0 length is 2
where the Read 2: [bytearray(b'\x00\x00')]
through Parameter #0 length is 2
keep repeating.
Updated to ESP32 firmware 1.3.0 to see if that makes any difference.
yea looks like it just 'hangs' - we should detect this and do a hard reset
I'm assuming this is the same root cause as in Fixed infinite loop when socket readline fails.
At times, the PyPortal will crash with part of a background displayed. Currently stress-testing the existing code with more frequent
wget
andfetch
calls to see what exceptions are thrown and how best to handle them (including soft reboots if things go completely sideways).