pycom / pycom-micropython-sigfox

A fork of MicroPython with the ESP32 port customized to run on Pycom's IoT multi-network modules.
MIT License
198 stars 167 forks source link

Socket is stuck until reboot #612

Open AndreaPici opened 1 year ago

AndreaPici commented 1 year ago

Hi there,

I have a problem with sockets. I'm using this piece of code to connect and disconnect to a server using socket:

class EXAMPLE:

    def __init__(self, slave_ip, slave_port, timeout):
        self.slave_ip = slave_ip
        self.slave_port = slave_port
        self.timeout = timeout
        self._sock = None

    def connect(self):
        created = False
        try:  
            self._sock = usocket.socket(usocket.AF_INET, usocket.SOCK_STREAM, usocket.IPPROTO_TCP)
            self._sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
            self._sock.setblocking(True)
            self._sock.settimeout(self.timeout)
            self._sock.connect(usocket.getaddrinfo(slave_ip, slave_port)[0][-1])
            created = True
            print('Socket created')                
            return created

        except:
            print('Cannot connect socket')
            self.close()
            return created 

    def close(self):
        try:
            if self._sock:
                self._sock.close()
                self._sock = None
            return True
        except:
            raise Exception('Cannot close socket')

The problem is that is sometimes stuck and is not able to connect to the server. I see that "settimeout" is done correctly but then stucks during connect. Timeout doesn't affect anything, it will stay in that stuck state for long time. Actually I'm using an external timer that checks for stuck socket, it starts when socket.connect() is called and is cancelled when socket is created. If not created then a machine.reset() is done.

After reset everything works correctly.

Please can you help me finding out what's wrong? Thank you very much.

ddtdanilo commented 1 year ago

Hi @AndreaPici,

I have looked at your code, and I give a few suggestions that might help you.

First, you're calling usocket.getaddrinfo(slave_ip, slave_port)[0][-1] directly in your connect method. This means it will perform a DNS lookup every time you connect. This might take more time and could fail. A possible solution would be to store the result of the DNS lookup in a variable during the initialization of the EXAMPLE class. This way, you're not constantly performing this potentially expensive operation.

Second, I noticed you're setting the socket to blocking mode using self._sock.setblocking(True). If the connect() operation takes longer than expected, the program will remain stuck there indefinitely. Consider using non-blocking mode along with a select function, which would allow you to specify a timeout for the connect() operation and move on if it's unsuccessful.

A code modification could look like this:

class EXAMPLE:
    def __init__(self, slave_ip, slave_port, timeout):
        self.slave_ip = slave_ip
        self.slave_port = slave_port
        self.timeout = timeout
        self._sock = None
        self.addr_info = usocket.getaddrinfo(self.slave_ip, self.slave_port)[0][-1]  # Do the lookup once

    def connect(self):
        created = False
        try:
            self._sock = usocket.socket(usocket.AF_INET, usocket.SOCK_STREAM, usocket.IPPROTO_TCP)
            self._sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
            self._sock.setblocking(False)  # Set the socket to non-blocking
            self._sock.connect(self.addr_info)
            read, write, error = select.select([], [self._sock], [], self.timeout)
            if write:
                created = True
                print('Socket created')
                return created
            else:
                raise Exception('Connect timeout')

        except:
            print('Cannot connect socket')
            self.close()
            return created

    def close(self):
        try:
            if self._sock:
                self._sock.close()
                self._sock = None
            return True
        except:
            raise Exception('Cannot close socket')

I hope this helps!

AndreaPici commented 1 year ago

Thank you @ddtdanilo for your help. I'll take your suggestions and let you know.

Maybe I'm wrong but I thought that the set_timeout function is used to se the timeout operations and if the connect would take a long time to go than that timeout will be triggered.

Have a nice time!