pulkin / micropython

MicroPython implementation on Ai-Thinker GPRS module A9 (RDA8955)
https://micropython.org
MIT License
103 stars 30 forks source link

Bugs and limitations of socket module #50

Open sebi5361 opened 4 years ago

sebi5361 commented 4 years ago

On the A9G board, with s being a Berkeley socket (import socket; s = socket.socket()) entering the following commands yields errors:

By comparison, behaviors of those commands on an ESP8266 with the latest firmware are:

pulkin commented 4 years ago

I fixed this:

  1. When s.connect called twice OSError(EISCONN) is raised
  2. When calling s.recv on empty socket OSError(ENOTCONN) is raised (on ESP8266 it's OSError(0))

Socket errors can be traced in cooltools. I was not able to reproduce the timeout problem: please provide a minimal working example.

sebi5361 commented 4 years ago
import cellular
import socket
cellular.gprs("tm", "", "")
s = socket.socket()
s.connect(('demo.traccar.org', 5055))
s.recv(50)

This hangs the board as the socket is blocking. Why Ctrl+C cannot interrupt this command?

import cellular
import socket
cellular.gprs("tm", "", "")
s = socket.socket()
s.connect(('demo.traccar.org', 5055))
s.settimeout(2)
s.recv(50)

Same code as above except added s.settimeout(2). This freezes the board but it should throw an OSError: [Errno 110] ETIMEDOUT error instead.

import cellular
import socket
cellular.gprs("tm", "", "")
s = socket.socket()
s.connect(('demo.traccar.org', 5055))
s.setblocking(False)
s.recv(50)

I have just noticed that replacing s.settimeout(2) by s.setblocking(False) makes the code work correctly as it throws the OSError: [Errno 11] EAGAIN error.

pulkin commented 4 years ago

All of your examples (ctrl-c, timeout and non-blocking) seem to work here. Maybe you can trace some events to figure what's happening in your case.

sebi5361 commented 4 years ago

What do you mean by tracing some events? Adding some printf in the c code? I will update my firmware first to see if those bugs subsist as I don't have the latest version installed currently.

sebi5361 commented 4 years ago

Indeed with the latest firmware I don't get any of those errors. Neat!

However if entering s.setblocking(False) before s.connect(...), I get an OSError: [Errno 115] EINPROGRESS. Maybe this is the normal behavior? But if I try s.connect(...) a second time, then the board crashes.

import socket
import cellular
cellular.gprs("tm", "", "")
s = socket.socket()
s.setblocking(False)
s.connect(('demo.traccar.org', 5055)) # OSError: [Errno 115] EINPROGRESS
s.connect(('demo.traccar.org', 5055)) # Crash!
pulkin commented 4 years ago

I fixed one limiting case instead of the core problem which requires patching. Some of the underlying lwip exceptions are not written in errno: they are, for some reason, redirected to the platform error handler which displays an error in coolwatcher and halts the module. I will try to fix it.

pulkin commented 4 years ago

The latest build produces this:

>>> s.setblocking(False)
>>> s.connect(('demo.traccar.org', 5055))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 115] EINPROGRESS
>>> s.connect(('demo.traccar.org', 5055))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: 106

I do not know whether it is a fair result but, at least, it does not halt the module and is easy to implement.

marcosasilvalepe commented 3 years ago

Hi! First of all thanks for this awesome project. I'm not familiar with C so having the option of programming the module with micropython makes eveything a lot easier.

I'm having this same problem with the module. Inside an infinite loop I do s.connect((url, port)) and after a few iterations the board stops working right after trying to connect to the host. Sometimes after the 10th iteration, sometimes after the 50th.

Watchdog reboots the module after 60 seconds and after that it starts transmitting again correctly until it encounters the same problem.

Doing s.settimeout(5) doesn't seem to work and using s.setblocking(False) before s.connect() always raises [Errno 115] EINPROGRESS and so data can't be sent using this option.

Watchdog gets me out of the pickle but I wanted to know if there is a solution to this without having to reboot the module all the time.

arlucio commented 3 years ago

Are you using ussl.wrap_socket? Maybe you are experiencing something like me in #61. At that time I concluded that it was probably axTLS fault and to solve the problem we would have to port another ssl library, like mbedtls.

If you are not using ssl, them it should work normally as I already have tested it extensively. Are you closing the open sockets? You do have to explicitly call s.close() to close the sockets or it WILL eventually stop working.

The last possibility is that maybe you are losing connection, and setting a handler to reconnect when that happens could solve that. GPRS can be quite more unreliable than some other options of connections so surely it is important to have a handler to reconnect when necessary.

marcosasilvalepe commented 3 years ago

I'm not using ssl since, from what I've read, it takes a bit longer to do the request.

The code I'm using inside the loop looks like this:

s = socket.socket() s.connect((url, port)) s.send(bytes('POST /myscript.php HTTP/1.1\r\nHost: {}\r\nContent-Type: application/json\r\nConnection: Keep-Alive\r\nContent-Length: {}\r\n\r\n{}'.format(url, len(body), body), 'utf8')) rsp = s.recv(256) s.close()

In the code body consists of a JSON with all the variables like latitude, longitude, speed, etc.

arlucio commented 3 years ago

Did you tried using a module like urequests?

marcosasilvalepe commented 3 years ago

I haven't since the POST request works fine. The module halts before getting to s.send(). I'll give it a try though. Thanks.

marcosasilvalepe commented 3 years ago

I tried with urequests but I encountered the same problem... the module freezes after 30 or so iterations.

I think it might not be a problem with the socket module. Maybe there is something wrong with my code. I thought that maybe the module runs out of ram so I imported gc and do gc.collect() at the end of each loop. The module has more than enough memory but it still freezes after a while.

Another problem I just noticed is that when I run main.py with ampy the module automatically registers in the network but when uploading main.py to the board and rebooting, sometimes the module just doesn't register at all.

cellular.is_network_registered() checks if it is registered but I don't know how to register to the network if it doesn't do it automatically. Any ideas ?

pulkin commented 3 years ago

If you have anything to reproduce the problem please dump it here.

marcosasilvalepe commented 3 years ago

OK, so I tried with a different network provider (I was using Movistar and switched to Entel) and I got a better result with the problem of the network registration so maybe the problem was that the signal wasn't very good.

Below is the code I use to check for the error. I tried to make it as short as possible but I still got 53 lines.

The url is a website that receives POST requests so you can inspect them and check the variables sent in the request. I usually get the loop to do between 20 to 30 iterations before the module hangs and watchdog reboots it.

I added print() everywhere to check where is it that the module freezes and it seems that s.close() is where it always hangs.

import gc
import time
import machine
import cellular
import socket

print("Starting module...\r\n")
machine.watchdog_on(60)
print("watchdog ON\r\n")
time.sleep(10)
print("sleep done\r\n")

try:
    print("Trying to connect\r\n")
    cellular.gprs("bam.entelpcs.cl", "", "")
    print("Connected to gprs...\r\n")
except Exception:
    print("Couldn't connect to gprs...\r\n")
finally:
    print("Finally...\r\n")
    counter=0
    url="www.ptsv2.com"
    while True:
        print("Entering loop...\r\n")
        if cellular.is_network_registered():
            counter+=1
            try:
                if cellular.gprs()==False:
                    cellular.gprs("bam.entelpcs.cl", "", "")
                    print("I got connected!!!\r\n")
                else:
                    print("Already connected...\r\n")
                body='{counter": "' + str(counter) + '"}'
                s=socket.socket()
                s.connect((url, 80))
                print("Connected to host...\r\n")
                s.send(bytes("POST /t/msgps/post HTTP/1.1\r\nHost: {}\r \nContent-Type: application/json\r\nConnection: close\r\nContent-Length: {}\r\n\r\n{}".format(url, len(body), body), 'utf8'))
                print("Socket sent...\r\n")
                rsp=s.recv(256)
                print("Response received\r\n")
                print(rsp.decode('utf-8'))
                s.close()
                print("Socket Closed\r\n")
            except Exception as err:
                print(str(err) + '\r\n')
            finally:
                machine.watchdog_reset()
                print("ITERATION:", counter, "\r\n ")
                gc.collect()
                time.sleep(5)
        else:
            print("Network not registered...Sleeping 5 seconds\r\n")
            time.sleep(5)
        print("-------------- END OF LOOP ------------\r\n\r\n")
ens4dz commented 3 years ago

I think most of hnags related to this issue: https://github.com/Ai-Thinker-Open/GPRS_C_SDK/issues/421 Could you add this line at beginning :

cellular.set_bands(cellular.NETWORK_FREQ_BAND_DCS_1800)

marcosasilvalepe commented 3 years ago

Very interesting!

I tried with cellular.set_bands(cellular.NETWORK_FREQ_BAND_PCS_1900) since that is the band used by the three big network providers in my country but the module still hangs.

Before trying with micropython I used the C SDK to transmit coordinates. Using the gps_tracker example I was sending POST requests without this problem. The module at times got a little confused and started transmitting the same coordinates and variables without updating them so when that happened I added some code to reboot it but I never encountered a problem where the module froze.

I prefer micropython anyway since it's so much easier to program and debug. C is a bit too verbose for my taste and using the same module with the same antennas and everything, with the C code I only got 5 satellites connected on average and as high as 7 satellites max, whereas using micropython (I don't know why) I get 7 to 8 on average and up to 14 max, and the difference is definitely perceivable when watching a marker moving live on a google map. The marker position is just perfect.

Anyway thanks for your comment, I'll keep this option inside the program to avoid any problems related to the voltage dropping.

marcosasilvalepe commented 3 years ago

So I kept testing and it turns out that the problem is not the voltage or the socket.close() method.

There's a problem with socket.recv(). After removing this from the code I was able to loop indefinetely without the module rebooting because it hangs, and the data gets correctly transmitted every time.

I do need to access the response from the server as it contains a few variables that the program inside the module uses.

I've read that other modules like ESP32 and ESP8266 have had this problem but they have been corrected so I don't know why this happens with the latest version of micropython.

UPDATE:

I solved it by replacing response=s.recv(256) with

ready = select.select([s], [], [], 5)
if ready[0]:
     response=s.recv(256)

Now it loops forever without hanging :)