pycom / pycom-micropython-sigfox

A fork of MicroPython with the ESP32 port customized to run on Pycom's IoT multi-network modules.
MIT License
196 stars 167 forks source link

GPY Modem - ESP32 Cannot Communicate, LTE() function error #445

Open ptcoregon opened 4 years ago

ptcoregon commented 4 years ago

GPY Modem-ESP32 Communication Issue

This document describes this hardware/firmware issue that dramatically impacts reliability and potential applications.

Issue Description

The fundamental issue reloves around this function in main.py:

lte = LTE()

The ESP32 is unable to connect to the modem, resulting in the error below from pycom-micropython-sigfox/esp32/mods/modlte.c

OSError: Couldn't connect to Modem (modem_state=disconnected)

After resetting, the problem repeats itself.

machine.reset()

After hundreds of soft resets, the LTE() function occasionally returns successfully and we are able to proceed with full functionality from then on. But after one (or more) soft resets or spontaneous disconnects, the problem returns. The same happens when resets occur via WDT.

Power cycling will usually cause the module to operate properly on the first try, but not always. And the issue always returns after a period of time.

Firmware Version Tests

This problem is present with many firmware versions, including the experimental Dev branch of pycom-mircopython-sigfox.

I am using the latest stable version, CATM1-41065.dup firmware for the modem. It is almost impossible to downgrade because, obviously, I cannot communicate with the modem.

Hardware Tests

This problem is present in two different GPY modules using the Expansion Board 3.1 and the provided cellular antenna.

One module with a firmware test: (sysname='GPy', nodename='GPy', release='1.20.1.r2', version='v1.20.1.r2-5-gdb7f895-dirty on 2020-03-25', machine='GPy with ESP32')

Fix Attempts

Trying lte = LTE() from the CLI does exactly the same thing

Adding pycom.lte_modem_en_on_boot(False) does nothing

Other Instances

I am clearly not the only one with this issue. See these forum threads from 2018:

https://forum.pycom.io/topic/3675/despite-heavy-investment-in-fipy-gpy-not-possible-to-use-board-as-anything-more-than-lte-modem-and-even-that-s-problematic

https://forum.pycom.io/topic/3129/lte-lte-getting-stuck-after-reset-fw-1-17-3-b1-on-fipy

Relevant Code

import pycom
import time
import os
from machine import WDT
from machine import SD
import machine

pycom.wdt_on_boot(True)
pycom.wdt_on_boot_timeout(240000)

wdt = WDT(timeout=240000)  # enable it with a timeout of 240 seconds
wdt.init(240000)
wdt.feed()

import ujson

pycom.wifi_on_boot(False)
pycom.heartbeat(False)

from machine import Pin
from network import WLAN

wlan = WLAN()
wlan.antenna(WLAN.EXT_ANT)
wlan.deinit()

import socket
import ssl
from network import LTE
from network import Bluetooth
from simple import MQTTClient
import ubinascii
import array
from machine import RTC

if pycom.lte_modem_en_on_boot():
    print("LTE on boot was enabled. Disabling.")
    pycom.lte_modem_en_on_boot(False)

print("LTE()")

try:
    lte = LTE()
except:
    time.sleep(6)
    machine.reset()

#we rarely get here...
print("LTE() done")
#rest of program...

Proposed Resolution

There must be a way to reset the communication lines between the ESP32 and the Modem without first executing lte = LTE(). I am comfortable building my own updated firmware with a solution from Pycom. However, I need assistance due to the complexity of the the LTE-related firmware and processes.

Thank you thank you thank you for any help in solving this issue!

abatardi commented 4 years ago

My company just spent thousands of dollars developing a sensor companion board for use with Pycom after some promising initial tests (trying to move away from Particle). However, it's become unusable lately, and the latest firmware/sequans firmware releases just seem to make it worse and worse. If we can even get an LTE connection at this point, it lasts maybe 2 minutes before disconnecting. We have tried hologram and nimbelink sims on verizon, same problem on both. Meanwhile Particle devices on the same sims/networks in the same physical location are making connections perfectly and holding them for days without dropping.

Why is this so bad? I understand there are a lot of differences and problems with cellular communication, but it seems the rest of the world has it figured out. Meanwhile I'm trying to decide if we are going to drop more thousands of dollars moving our new board design back to Particle or continue to throw more wasted time and money at Pycom. Ridiculous.

abatardi commented 4 years ago

LTE connect() LTE is_connected() LTE connection established connect_lte with start_mqtt is now removed please call communication_protocol or start_mqtt directly MQTT Protocol Packet sent. (Length: 103) This is PybytesProtocol.start_MQTT Packet sent. (Length: 44) Connected to MQTT mqtt.pybytes.pycom.io Pybytes connected successfully (using the built-in pybytes library) This is pack_info_message() pack_message: b'310504a00507' MQTT Protocol Socket send error [Errno 104] ECONNRESET This is pack_pybytes_message_variable(5, 0, bytearray(b'\x00\x00\x00\x00\x00')) pack_message: b'3e05000000000000' MQTT Protocol Socket send error [Errno 104] ECONNRESET This is pack_pybytes_message_variable(5, 0, bytearray(b'\x00\x00\x00\x01\x00')) __pack_message: b'3e05000000000100' MQTT Protocol Socket send error [Errno 113] ECONNABORTED

catalinio commented 4 years ago

Hi,

Please drop us a short email here: https://pycom.io/community/contact-support/ we can provide you an experimental modem firmware.

Best wishes, Catalin

ptcoregon commented 4 years ago

After working with the Pycom engineers, it appears that the "experimental" new Modem firmware they have fixes this problem. However, I am keeping this ticket open until I see that someone has posted a link to where people can get this firmware.

abatardi commented 4 years ago

Meanwhile pycom engineers continue to completely ignore the support request I submitted to them 3 WEEKS ago. Please make these firmware files available.

tlanier9 commented 3 years ago

I've also seen this issue using V1.20.3.b0.

Has anything been done to attempt to fix this?

Does the GPy have a method to physically reset the modem (not AT command)?

A power down/up fixed the problem in my case.

amcewen commented 3 years ago

Is there any update on the "experimental" new modem firmware? We're seeing the same problem on some of our devices, and don't have an easy way to power-cycle them.

tlanier9 commented 3 years ago

Adrian,

We discontinued using the LTE() module because of connectivity issues and modem lockups that were only resolved by power cycling the GPy.

Instead we wrote modem routines that talk directly to the modem using the serial port.

https://forum.pycom.io/topic/6202/method-to-take-control-of-cell-modem?_=1623780158932

We have since had stability issues with various newer versions of the OS and modem firmware. The most stable versions that we currently use are:

Pycom MicroPython V1.20.0.rc13 release candidate

Modem Firmware

UE5.1.0.0f

LR5.1.1.0-41065

Newer releases of the firmware that have come with GPyโ€™s purchased in the last few months have caused crash issues.

Tommy


Thomas H. Lanier, CFO Computer Engineered Solutions, Inc. 206 Lukken Industrial E. LaGrange, GA 30241 Voice: 706-882-4704 Fax: 706-882-4001 Email: @. @.> Web: http://www.ces-web.com http://www.ces-web.com/

From: Adrian McEwen @.> Sent: Tuesday, June 15, 2021 1:11 PM To: pycom/pycom-micropython-sigfox @.> Cc: tlanier9 @.>; Comment @.> Subject: Re: [pycom/pycom-micropython-sigfox] GPY Modem - ESP32 Cannot Communicate, LTE() function error (#445)

Is there any update on the "experimental" new modem firmware? We're seeing the same problem on some of our devices, and don't have an easy way to power-cycle them.

โ€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pycom/pycom-micropython-sigfox/issues/445#issuecomment-861678366 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPDXIO3XJH6EZZICHECODTS6CTVANCNFSM4M77HMUA . https://github.com/notifications/beacon/AANPDXPAC6ZBDAXEFEBQIQDTS6CTVA5CNFSM4M77HMUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGNOCWHQ.gif

abatardi commented 3 years ago

We had to switch to a different product entirely as the pycom units we had in the field caused countless issues and we had site visits literally on a weekly basis. These are in no way ready for prime time.

peter-pycom commented 3 years ago

Is there any update on the "experimental" new modem firmware? We're seeing the same problem on some of our devices, and don't have an easy way to power-cycle them.

CAT-M fw update has been released: https://forum.pycom.io/topic/6881/lte-modem-firmware-release-for-cat-m1-5-2-48829

wrt FW version for the esp32: If you intend to use our LTE class then I'd strongly suggest 1.20.2.r4 since it contains some LTE fixes (If you use another solution like mentioned by @tlanier9 this is less relevant)

pkharvey commented 3 years ago

We've been trying for some time to find a fix/workaround to these same LTE() issues you're experiencing while trying to use the Fipy on NB-IoT in both the UK and the US. It would be ideal for us if the issues with the LTE class could be resolved for our existing code and hardware as we are also unable to power-cycle the modem in our intended application, and we're able and willing to help do some testing here with the equipment we have and contribute our findings.

Our focus is on getting the Fipy working on NB-IoT in the US ideally, but the LTE functionality is equally as important to us in general. Our UK Fipy can send/receive on NB-IoT whenever the LTE class doesn't error out, however we haven't had our US Fipy work once on NB-IoT despite using a known-working SIM.

We have a few Fipys in the UK and US on firmware 1.20.0.rc13 or 1.20.2.r4, running basic NB-IoT Python code from file or REPL. Most are on custom boards with sensors but we also have Pysense/Expansion boards for testing. Modem firmware is LR6.0.0.0-41019 (NB-IoT). We're yet to try 46262, but I'll report back on our findings with it when we try it. For sake of doubt, some tests were performed with a (1000 ยตF + 100 nF) capacitor pair added to each power input pins and 3.3 V out, then powered by battery, USB, sometimes both, or external power supply and found no improvement or difference with/without any combination. (Notably with the modem active and Fipy idle in REPL, sometimes observed power spikes around 600 mA for 7 ms, peaks reduced slightly with capacitors added but functionally the same).

Some of our investigations / findings:

We also tried some program-stress tests, repeatedly running the same LTE init/send/receive/deinit/reset code over 150 times to spot any patterns (varying our experimental setup after around 20 tries each). There was no clear difference varying power source or with/without capacitors. We did observe a pattern when calling deinit() quickly after receiving an NB-IoT downlink where it succeeded almost every time with only a few failures, however adding a few seconds extra processing delay between the downlink and the deinit() prompted it to fail almost every time with a few successful deinits.

It seems the internal functions can't take back control over the modem at times where the modem is expected to be in one given state. Maybe there's another way of interrupting the modem or getting its attention (is DTR wired and does the modem use it?).

Let's hope some clues lead to a better understanding and a solution. We have a lot of data. I can provide more detail on any of these points if needed.

pkharvey commented 2 years ago

Had to focus on some procurement here. The LTE module is still important to us - I haven't forgotten. I have some current profile plots that I can put up... I'll be back to post those when I get a moment.

pkharvey commented 2 years ago

Thanks to you all for your patience. Unfortunately despite best efforts, we haven't been able to get much closer to solving it once and for all, however we made some small steps and I can give a bit more information on what we've learned. I don't have vast experience with LTE modems so might give some awkward descriptions or have missed some clues.

To recap (OP described it best), we were experiencing a problem failing to gain control of the LTE modem at times and so the LTE() object could not be created, and so it could not be lte.deinit()ed etc. We were running the NB1-41019 firmware. We need to deepsleep in our application but we can't power cycle.

This would often happen to us when attempting an lte.attach(), whereby if it failed to attach in allotted time, the program would time out, reset, and be unable to reinitialize the modem with LTE(). The logic analyzer could see the Fipy trying to interrupt the modem with +++ while the modem appears to be stuck in data mode ignoring the interrupt (the PyJTAG is a bit out of reach for us).

Unable to predict when or recreate the conditions (it would usually just happen from time to time), we found it to be happening a lot with one of our test devices running NB1-41019, increasingly frequent until it became effectively bricked. With nothing to lose, it was decided to flash the experimental NB1-46262 available from a Pycom service request. Since flashing 46262, we can regain control over the modem almost every time (failed twice but was able to retry and regain control). Not sure whether it was just the act of flashing the modem firmware that cleared its config or the version itself, but flashing it did mostly unblock it for us in this instance. Monitoring the current draw, we observed the modem running 46262 to enter deepsleep every time with lte.deinit(reset=True).

Our capabilities will be limited with the equipment we have but if there are any suggestions we can try to get closer to resolving it, we can post our results.

SebastiaanMerckx commented 2 years ago

I have that same 46262 firmware and, together with an unofficial 1.20.2.rc11 micropython binary, this is the most stable that I could get so far (using NB-IoT). So it means we have a live setups with unofficial modem firmware and unofficial pycom firmware, great ๐Ÿ˜„ .

tlanier9 commented 2 years ago

We ended up having to add an external watchdog circuit which toggles the +5V power to the GPy chip.

From: SebastiaanMerckx @.> Sent: Wednesday, November 24, 2021 9:01 AM To: pycom/pycom-micropython-sigfox @.> Cc: tlanier9 @.>; Mention @.> Subject: Re: [pycom/pycom-micropython-sigfox] GPY Modem - ESP32 Cannot Communicate, LTE() function error (#445)

I have that same 46262 firmware and, together with an unofficial 1.20.2.rc11 micropython binary, this is the most stable that I could get so far (using NB-IoT). So it means we have a live setups with unofficial modem firmware and unofficial pycom firmware, great ๐Ÿ˜„ .

โ€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pycom/pycom-micropython-sigfox/issues/445#issuecomment-977905585 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AANPDXIU6CVNO7M6UIDCUNDUNTVZBANCNFSM4M77HMUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AANPDXNM37VQQ4AXXL7M5KLUNTVZBA5CNFSM4M77HMUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOHJE2PMI.gif

jonnerd154 commented 2 years ago

We ended up having to add an external watchdog circuit which toggles the +5V power to the GPy chip. @tlanier9

We did the same. An external micro (ATTINY) with a couple GPIO pins for interface to the GPy, and one to control the regulator's Enable pin. It replaced the expensive pushbutton controller we used for on/off control, so we got more functionality for less money in the end. And the GPy can continue running LTE! ๐ŸŽ‰

Feels a bit drastic, but it works. I would recommend planning this into any GPy LTE project from the beginning.

curtmiller commented 2 years ago

Some great insight in here. We have spent the last week or so trying to talk some sense into the GPy LTE modem with little to no success or consistent performance as a result. Going to try implementing an external keepalive circuit as it appears to have worked for a few people on here.

Have there been any further discoveries that will help us achieve consistent GPy LTE connection? Trying to keep boards revs to a minimum. Thanks guys!

jonnerd154 commented 2 years ago

Have there been any further discoveries that will help us achieve consistent GPy LTE connection? Trying to keep boards revs to a minimum.

~1 year after we added the external watchdog, we are still very happy we did. It has worked very well, and I still recommend implementing one in every GPy design. There are still some quirks, but having the ability to power cycle the whole Pycom rescues us from most of them.

From a hardware perspective, here's the kitchen sink we threw at it with good results:

Also, I wanted to make sure you have seen this: https://github.com/pycom/pycom-micropython-sigfox/issues/600#issuecomment-1148441412 . Depending on where you are in the product life cycle, it might be good to avoid the GPy in your design. :(

curtmiller commented 2 years ago

Fantastic, thank you very much! Great to see there is light at the end of the tunnel for getting these guys up and running. Unfortunately for us, we have a fairly significant stash of these GPy's.. so we'd like to at least get them reliable for some product testing and customer feedback.

As far as firmware is concerned, has the most reliable combination been 46262/1.20.2.rc11 as @tlanier9 mentioned in their post? We are aiming for CAT-M1, so their implementation may not be ideal as it sounds like it's geared for NB-IoT.

tlanier9 commented 2 years ago

As indicated previously we are using Pycom MicroPython V1.20.0.rc13 release candidate in our application. We do not use the LTE module for connectivity but instead talk directly to the cell modem thru the serial port. It is possible that improvements have been made to the LTE module in newer versions that we have not tried. The external watchdog is definitely necessary for long-term reliability.

ELundby45 commented 2 years ago

@curtmiller, @jonnerd154's information has all been based on CATM1-5.2-48829/1.20.2.r4.

One additional tidbit that has been helpful was from #585: Occasionally I have seen that the GPy has been running fine for weeks (with a few automated power cycles from the external watchdog) but will get into a cycle where it will not longer attach. Sending the AT&F command and then a reset has been able to get the GPy out of this state.

curtmiller commented 2 years ago

Thank you guys for the input! I really appreciate it.

curtmiller commented 2 years ago

We have achieved LTE connection and have since been working out some of the hardware kinks in hopes of better reliability. However, with the recent EOL announcement for the GPy my manager has decided to change course and pursue a more future-proof product.

If those of you who have had success with your GPy designs are interested in more stock, shoot me an email at curtis@biktrix.com. Again, thank you for the input guys!