Closed DrBomb closed 3 years ago
ok, i tried to reproduce this today and couldn't. it seems that my operators just don't care what AP i give, they completely ignore it.
If your module is 4G I believe the whole tower session includes the data session so it does not matter if you have an incorrect APN.
Perhaps trying setting your module to 3G only, or using a 3G sim card
i'm using SIM800L modem, which i believe is GPRS only
Reviving this issue as the same is still happening. I've been avoiding rebasing the library due to this. But I decided doing one this week to see if I missed any new features and to consolidate my library fork only to be met with it again.
[Apr 22 01:45:09.508] mgos_pppos.c:408 Connected to mobile network (home)
[Apr 22 01:45:09.913] mgos_pppos.c:459 Operator: 73404,movistar Movistar
[Apr 22 01:45:10.005] mgos_pppos.c:468 RSSI: -67
[Apr 22 01:45:10.214] mgos_pppos.c:816 Starting PPP, user ''
[Apr 22 01:45:10.221] mgos_mongoose.c:66 New heap free LWM: 225964
[Apr 22 01:45:10.226] mgos_net.c:90 PPP: connected
[Apr 22 01:45:10.238] mgos_mongoose.c:66 New heap free LWM: 223016
[Apr 22 01:45:10.251] mgos_mongoose.c:66 New heap free LWM: 222740
[Apr 22 01:45:40.242] mgos_pppos.c:854 Failed to acquire IP
[Apr 22 01:45:40.246] mgos_pppos.c:857 Phase before close 9
[Apr 22 01:45:40.251] mgos_pppos.c:858 Close return value 0
[Apr 22 01:45:40.253] mgos_pppos.c:859 Phase after close 6
[Apr 22 01:45:40.259] mgos_pppos.c:860 Free return value -11
[Apr 22 01:45:40.264] mgos_net.c:86 PPP: connecting
[Apr 22 01:45:42.712] mgos_pppos.c:668 Connecting (UART2, APN 'test')...
[Apr 22 01:45:42.718] mgos_pppos.c:711 Automatic operator selection
[Apr 22 01:45:43.193] mgos_pppos.c:279 Manufacturer: SIMCOM INCORPORATED
[Apr 22 01:45:43.193] Model: SIMCOM_SIM7500A
[Apr 22 01:45:43.193] Revision: SIM7500A_V1.5
[Apr 22 01:45:43.193] SVN: 04
[Apr 22 01:45:43.193] IMEI: SNIP
[Apr 22 01:45:43.198] +GCAP: +CGSM,+DS,+ES, IMEI: SNIP
[Apr 22 01:45:43.202] mgos_stmi2c.c:476 STM Version 0x27
[Apr 22 01:45:43.509] mgos_pppos.c:349 SIM is ready, IMSI: SNIP, ICCID: SNIP
[Apr 22 01:45:43.806] mgos_pppos.c:408 Connected to mobile network (home)
[Apr 22 01:45:44.211] mgos_pppos.c:459 Operator: 73404,movistar Movistar
[Apr 22 01:45:44.304] mgos_pppos.c:468 RSSI: -67
[Apr 22 01:45:44.513] mgos_pppos.c:816 Starting PPP, user ''
[Apr 22 01:45:44.526] assertion "netif already added" failed: file "/opt/Espressif/esp-idf/components/lwip/lwip/src/core/netif.c", line 405, function: netif_add
[Apr 22 01:45:44.571]
[Apr 22 01:45:44.571] abort() was called at PC 0x40113683 on core 0
[Apr 22 01:45:44.571]
[Apr 22 01:45:44.571] Backtrace:0x400878ba 0x40087e85 0x4008e402 0x40113683 0x4016ee97 0x40177fac 0x40178645 0x40177cee 0x4016a165
So, the crash is right now happening because there's an assertion that tells us "netif already added". Digging deeper and adding a few more log statements I track the phase state of the PPP process and seems like when calling pppapi_close
the phase doesn't change to DEAD which would result on the pppapi_free
to return 0 and then remove the netif, instead it returns -11 which is ERR_CONN and the netif is never removed, leading to the assert failure on the next time the interface is added.
How do we enable lwip logging? I'm looking for the reason why the phase doesn't go to DEAD
For reproduction steps, I remember you telling me choosing a wrong APN doesn't work. Perhaps you could lower the deadline for acquiring the IP address, that would trip the section of the code and hopefully, you should get the same crash as I do.
Alright, I've got the final results of testing.
When calling pppapi_close
the liberation of the protocol is not instant, rather it takes some time while I assume lwip finishes their own processes. Afterwards the mgos_pppos_status_cb
callback is executed with the PPPERR_USER
code which is what calls pppapi_free
and wraps up everything for the next connection try.
The problem is that PPPOS_INIT is entered immediately after calling pppapi_close
resulting on a new interface being added to lwip before the old one is freed.
I propose adding a PPPOS_CLEANUP state to the main pppos lib that will not allow PPPOS_INIT to start before PPPERR_USER is emitted and the interface is freed.
I already suspect it might introduce more issues. As the library code at least for me is somewhat complex, I might be screwing up other things such as user disconnections or running AT commands while on cleanup. I will create a preliminary PR but I think this needs a more expert hand to properly address.
thanks for investigating it. what you say makes sense, adding cleanup state before init is the right way to deal with this, i think. let's call it PPPOS_CLOSING
Added PR, please check it out when you can :+1:
looks good, just clean up the style and i'll merge
Platform: ESP32
Issue:
If the PPPoS session fails to get an IP address. This line forces the PPPoS stack to reinit. After reaching the same state it finally core dumps twice (not leaving a coredump file behind) and reboots.
How to reproduce:
Hook up a modem to the ESP32. I used no flow control lines, TX and RX connected to the ESP32. I'm using UART 2 for this.
Set up the APN settings incorrectly. Make sure the modem is on before restarting the firmware.
The stack will reach the part where the PPP link is established, and due to the wrong APN settings it will fail to get an IP address. Afterwards the firmware will restart the PPP stack and fail to get the IP address and the firmware will crash.
More Info
The following logs have been acquired running
latest
on both multicore and unicore. Because there are no core dump files, I symbolized the backtraces and placed them at the end of the logs.Unicore
Multicore
Running on
2.17.0
with the2.17.0
version of the PPPoS library does not make the firmware crash. After awhile of the PPP link staying inconnected
aError 9 (phase 0), reconnect
error is shown and the process is restarted.However, I did build the latest PPPoS library on 2.17.0 (release) mOS, compiling the sources directly and can confirm the crash happens as well, leaving the device for some reason on a unrecoverable state where the bootloader is unable to boot mOS again until a hard reset
Some remote network conditions mean that the PPPoS link might not be able to get an IP address the first time around, so this could affect many devices of ours deployed on the field.