raburton / esp8266

Various bits of code for ESP8266
http://richard.burtons.org/
183 stars 47 forks source link

OTA failure: Broken pipe #32

Closed rith87 closed 8 years ago

rith87 commented 8 years ago

First, thanks for working on the bootloader! I've definitely thought about having >2 ROMs but I didn't feel comfortable embarking on such a project on my own.

Unfortunately, I've run into some troubles. Here's what I'm seeing:

  1. ESP connects to the wifi network
  2. ESP sends the GET request for the new firmware
  3. Webserver (Flask) receives the GET request
  4. Webserver responds with 200
  5. Webserver throws a "Broken pipe" exception

Here's the call stack:

Traceback (most recent call last):
  File "/usr/lib/python2.7/SocketServer.py", line 593, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python2.7/SocketServer.py", line 334, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python2.7/SocketServer.py", line 651, in __init__
    self.finish()
  File "/usr/lib/python2.7/SocketServer.py", line 710, in finish
    self.wfile.close()
  File "/usr/lib/python2.7/socket.py", line 279, in close
    self.flush()
  File "/usr/lib/python2.7/socket.py", line 303, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
----------------------------------------

Is this the "python server" that sends headers in multiple chunks?

rith87 commented 8 years ago

Hmm.. Looking at Wireshark, it seems like the ESP gave the server a bad packet?

139 76.955337000    192.168.30.233  192.168.30.134  IPA 209 unknown 0x54 [Malformed Packet]

And then I think it tried to reconnect

145 76.996958000    192.168.30.233  192.168.30.134  TCP 54  38364 > commplex-main [FIN, ACK] Seq=156 Ack=18 Win=5823 Len=0
146 76.996961000    192.168.30.233  192.168.30.134  TCP 54  38364 > commplex-main [RST, ACK] Seq=157 Ack=1478 Win=5840 Len=0

Anyone seen this behavior before?

rith87 commented 8 years ago

Hmm.. I still see the malformed packets when I call the system_start_upgrade() function but system_start_upgrade() will still trigger the firmware download. Here are my observations (sent from ESP to server):

  1. HEAD request, malformed
  2. GET request, malformed
  3. FIN ACK
  4. SYN
  5. GET request, malformed
  6. Firmware download begins

Is it because the Content-Length is missing?

raburton commented 8 years ago

Hmm.. Looking at Wireshark, it seems like the ESP gave the server a bad packet? 139 76.955337000 192.168.30.233 192.168.30.134 IPA 209 unknown 0x54 [Malformed Packet]

Looks like wireshark is misinterpreting the protocol (as IPA) of this packet for some reason (possibly the port your server is running on?), so I don't think I'd worry too much about that.

Getting the debug messages out of the OTA code would help a lot, but I wouldn't be surprised if the problem was the python web server, I'd certainly try it with a proper server like apache or iis and see if the problem goes away.

nickandrew commented 8 years ago

@rith87 I think you should look further into that bad packet. Either your ESP is sending a bad packet or wireshark is broken, and if you can't trust your analysis tools what can you trust?

I'd trust wireshark before the ESP any day. But running tcpdump should resolve this in favor of one or the other.

Finally, the "broken pipe" error is because the server tried to write to a socket that has been closed. That might indicate bad programming on the ESP, but in any case a web server needs to ignore this error, because it happens in real life - clients or networks break while a response is being sent.

rith87 commented 8 years ago

"Getting the debug messages out of the OTA code would help a lot"

I don't see a flag that enables more logging but this is all I got:

ota
Updating...
Firmware update failed!

@raburton: I'm definitely in your camp about the python web server. I'll set up an apache server tomorrow but in the meantime, any chance I can hit up one of your servers?

nickandrew commented 8 years ago

@rith87 By the way, I don't think a GET request uses Content-Length because there is no 'content' or 'body' to such a request.

rith87 commented 8 years ago

@raburton: You're right, the malformed packet was because I was running my server on port 5000. @nickandrew: You're right about the GET request. Thanks for the tip!

rith87 commented 8 years ago

@raburton: I just tested with apache and the OTA upgrade failed as well with apache returning a 400. Here's the output from wireshark:

6   4.721856000 192.168.30.233  192.168.30.134  HTTP    209 GET firmware/latest/rom1.bin HTTP/1.1 
7   4.721899000 192.168.30.134  192.168.30.233  TCP 54  http > 6187 [ACK] Seq=1 Ack=156 Win=30016 Len=0
8   4.722165000 192.168.30.134  192.168.30.233  HTTP    540 HTTP/1.1 400 Bad Request  (text/html)
rith87 commented 8 years ago

The bug is so embarrassing :|.

GET /firmware/latest/rom1.bin HTTP/1.1

nickandrew commented 8 years ago

Egads.

rith87 commented 8 years ago

FWIW, with the exact same embedded code, Flask still fails with a broken pipe. Also, Flask returns a 200 with firmware/latest/rom1.bin (without the slash in the front). That's still a mystery but perhaps not for this repo.

@nickandrew, @raburton: Thanks for the help!