Open peterhinch opened 7 years ago
Looking into this ...
Why not check for EINPROGRESS when attempting to write? Code below works for me
import network
import socket
from time import sleep
import uerrno
SERVER = '192.168.0.9' # can use 'iot.eclipse.org'
SERVER = 'iot.eclipse.org'
s = network.WLAN(network.STA_IF)
s.active(True)
def connect():
s.disconnect()
sleep(1)
s.connect('SSID', 'PASSWORD') # Redacted EDIT this
print('Awaiting conection')
while not s.isconnected():
sleep(1)
print('Got conection, pausing') # Just to make sure
sleep(3)
print('conection done')
def foo(n):
print('Attempt:', n)
connect()
addr = socket.getaddrinfo(SERVER, 1883)[0][-1] # MQTT
s = socket.socket()
s.setblocking(False)
try:
s.connect(addr)
except OSError as e:
if e.args[0] not in [uerrno.EINPROGRESS, uerrno.ETIMEDOUT]:
print('Error connecting', e)
return
attempts = 3
while attempts:
attempts -= 1
try:
s.write(b"\xc0\0") # MQTT ping
print('Write succesful.')
break
except OSError as e:
if e.args[0] not in [uerrno.EINPROGRESS, uerrno.ETIMEDOUT]:
print('**** ERROR writing ****', e)
sleep(.5)
s.close()
foo(1)
Here it works only by ignoring the 119 error and trying again. I've never seen the ESP8266 raise 119 and I'm not sure of its significance - perhaps you could advise. Is it always legitimate to trap it and re-send the entire buffer (which in the general case might be long)?
Running your code with only the WiFi credentials changed:
>>> import rats1
I (179751) wifi: wifi firmware version: f092575
I (179751) wifi: config NVS flash: enabled
I (179751) wifi: config nano formating: disabled
I (179751) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (179761) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
I (179781) wifi: Init dynamic tx buffer num: 32
I (179781) wifi: Init dynamic rx buffer num: 64
I (179781) wifi: wifi driver task: 3ffd926c, prio:23, stack:4096
I (179781) wifi: Init static rx buffer num: 10
I (179791) wifi: Init dynamic rx buffer num: 0
I (179791) wifi: Init rx ampdu len mblock:7
I (179801) wifi: Init lldesc rx ampdu entry mblock:4
I (179801) wifi: wifi power manager task: 0x3ffde558 prio: 21 stack: 2560
I (179811) wifi: wifi timer task: 3ffdf5d4, prio:22, stack:3584
I (179831) phy: phy_version: 355.1, 59464c5, Jun 14 2017, 20:25:06, 0, 0
I (179831) wifi: mode : null
I (179841) modsocket: Initializing
I (179851) wifi: mode : sta (24:0a:c4:05:e5:58)
I (179851) wifi: STA_START
Attempt: 1
Awaiting conection
I (181211) wifi: n:3 0, o:1 0, ap:255 255, sta:3 0, prof:1
I (181771) wifi: state: init -> auth (b0)
I (181781) wifi: state: auth -> assoc (0)
I (181781) wifi: state: assoc -> run (10)
I (182681) wifi: connected with misspiggy, channel 3
I (182681) wifi: event 4
I (183261) event: ip: 192.168.0.33, mask: 255.255.255.0, gw: 192.168.0.1
I (183261) wifi: GOT_IP
Got conection, pausing
conection done
**** ERROR writing **** 119
Write succesful.
>>> I (191781) wifi: pm start, type:0
>>>
The ESP8266 and ESP32 use two different versions of LWIP (ESP8266 version is from 2012, ESP32 from 2016).
The ESP8266 version should throw an EINPROGRESS
if it doesn't connect immediately when timeout is set to 0. The fact that it doesn't suggests the underlying LWIP implementation is different. The ESP32 version doesn't seem to implement timeouts for connect
at all, unless one of the underlying LWIP timeout options also applies to connect
. But it fails on write
-- it's almost as if the write triggers something in LWIP to complete the connection process.
Which I'd understand if it threw EINPROGRESS
or ETIMEDOUT
on the first call to write()
. The remedy would be to delay and retry. But error 119 is ENAVAIL
"No XENIX semaphores available" which seems unexpected.
Throwing an exception on connect
wouldn't help the user unless there were a way subsequently to poll the connection status before using the socket. In the absence of such a method the only way seems to be to use the socket, trap the exception, and retry until at least one character is written; I guess this conforms to modern Python coding practices.
I performed a quick test where I trapped the 119, including tests of writing a longer string. It seems that it only fails on the first pass. The second pass correctly writes the string.
So perhaps this boils down to the LWIP raising a "wrong" exception number in the event of a pending connected
status. If you think we're on the right track I'll proceed on that basis.
I turned some debugging on ... lwip_send is getting error -5 back from netconn_write_partly and then translating that to EINPROGRESS. The trouble is that EINPROGRESS isn't consistently defined:
esp-idf/components$ egrep -r '#define\s+EINPROGRESS\s'
lwip/include/lwip/lwip/arch.h:#define EINPROGRESS 115 /* Operation now in progress */
newlib/include/sys/errno.h:#define EINPROGRESS 119 /* Connection already in progress */
LWIP seems to be picking up on 119 definition, whereas lwip would normally define it as 115 and that's what py/mperrno.h things too ...
LWIP seems to be picking up on 119 definition, whereas lwip would normally define it as 115 and that's what py/mperrno.h things too ...
Nice digging! There's already code in esp32/modsocket.c to convert EINPROGRESS to MP_EINPROGRESS, so the fix should be to make sure that the EINPROGRESS in that file is the same one that lwip uses. It's strange that the newlib errno.h is being used though.... how is that happening, is it on purpose?
I should add that I've also seen 118 in my testing. My code treats EINPROGRESS, ETIMEDOUT, 118 and 119 as indicating that the connection is not yet established. So far it seems reliable.
Yeah, I didn't quite get to the bottom of it but there's a LWIP_PROVIDE_ERRNO
flag there which isn't being set, instead it is using newlib's one. Not sure why we define our own MP_E*
as well ...
Not sure why we define our own MP_E* as well ...
Because some toolchains use non-standard errno codes and in the past was confusing that some ports would report certain error numbers, while others would report other numbers for the same thing. It was decided to standardise around one set of errno codes so that it was easier to work out what the code corresponded to (for those codes that don't have a human-readable string associated with them in the uerrno module).
In the course of updating this asynchronous MQTT driver I want to make it compatible with ESP32. The code uses a nonblocking socket. I get OSError 119 when writing to the socket. The following test runs on ESP8266 but fails on ESP32.
Note that I issue disconnect() before connecting. This is because my driver reconnects after a network outage. It also ensures consistent behaviour if an application is started with a network connection already active.