zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.48k stars 6.41k forks source link

PPP/LCP: LCP statemachine stuck after link termination #75603

Closed ljdevdev closed 1 week ago

ljdevdev commented 2 months ago

I have a curious observation, and I am not quite sure, if I am using the driver wrong, or I encountered a bug. The LCP-statemachine gets stuck after calling net_if_down on the PPP interface. Do I need to reset the driver somehow?

Obeserved behaviour I am running a custom AT-Command modem driver that hands over to the Zephyr-PPP driver after successfully establishing a mobile link.

When the modem terminates the PPP link (see log below) the PPP-driver switches to phase PPP_DEAD and the LCP statemachine switches to PPP_STOPPED

The AT-Command driver then takes over, calls net_if_down , reestablishes the mobile connection and calls net_if_up on the ppp-interface.

The LCP-Statemachine then gets stuck in state PPP_STOPPED and never establishes a new link.

Solution The problem goes away if I reset lcp.fsm.state to PPP_INITIAL in ppp_link_terminated (https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/l2/ppp/link.c#L115):

void ppp_link_terminated(struct ppp_context *ctx)
{
        k_sem_give(&ctx->wait_ppp_link_terminated);

        if (ctx->phase == PPP_DEAD) {
                return;
        }

        /* TODO: cleanup things etc here if needed */

        ppp_change_phase(ctx, PPP_DEAD);
/*+*/   ppp_change_state(&ctx->lcp.fsm, PPP_INITIAL);

        NET_DBG("[%p] Link terminated", ctx);
}

I am unsure though, if this is the proper place or way to reset the LCP statemachine.

To Reproduce Unfortuantely I have no easy way to reproduce this without a modem that terminates the connection. If someone points me to a testsuite where I could add this as a test, I would be happy to do so.

Expected behavior The statemachine should start from the same initial state each time ppp_start is called.

Impact This bug makes the PPP driver unusable in a commercial product

Logs and console output


[00:05:13.538,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x20274270, st=0, user_data=(nil)
[00:05:14.261,000] <dbg> net_ppp: ppp_input_byte: Address byte (0xff) start
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_fsm_input: (rx_q[0]): [LCP/0x20207350] LCP Terminate-Req (5) id 3 payload len 28
[00:05:14.265,000] <dbg> net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Current state OPENED (9)
[00:05:14.265,000] <dbg> net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Terminated by peer (No network protocols running)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_state_debug: (rx_q[0]): [LCP/0x20207350] state OPENED (9) => STOPPING (5) (fsm_recv_terminate_req():877)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_fsm_lower_down: (rx_q[0]): [IPCP/0x20207420] Current state OPENED (9)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_state_debug: (rx_q[0]): [IPCP/0x20207420] state OPENED (9) => STARTING (1) (ppp_fsm_lower_down():282)
[00:05:14.265,000] <dbg> net_if: net_if_ipv4_addr_rm: (rx_q[0]): [0] interface 0x202030e8 address 100.84.56.8 removed
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_phase_debug: (rx_q[0]): [0x20207318] phase RUNNING (4) => TERMINATE (5) (ppp_network_down():37)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_network_down: (rx_q[0]): [0x20207318] Proto IPv4 (0x0021) down (0)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_fsm_close: (rx_q[0]): [IPCP/0x20207420] Current state STARTING (1)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_state_debug: (rx_q[0]): [IPCP/0x20207420] state STARTING (1) => INITIAL (0) (ppp_fsm_close():240)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_fsm_lower_down: (rx_q[0]): [IPV6CP/0x202074f8] Current state ACK_RECEIVED (7)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_state_debug: (rx_q[0]): [IPV6CP/0x202074f8] state ACK_RECEIVED (7) => STARTING (1) (ppp_fsm_lower_down():268)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_fsm_close: (rx_q[0]): [IPV6CP/0x202074f8] Current state STARTING (1)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_state_debug: (rx_q[0]): [IPV6CP/0x202074f8] state STARTING (1) => INITIAL (0) (ppp_fsm_close():240)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_fsm_lower_down: (rx_q[0]): [PAP/0x202075c0] Current state CLOSED (2)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_state_debug: (rx_q[0]): [PAP/0x202075c0] state CLOSED (2) => INITIAL (0) (ppp_fsm_lower_down():273)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_phase_debug: (rx_q[0]): [0x20207318] phase TERMINATE (5) => DEAD (0) (ppp_link_down():133)
[00:05:14.265,000] <dbg> net_l2_ppp: ppp_change_phase_debug: (rx_q[0]): [0x20207318] phase DEAD (0) => ESTABLISH (1) (lcp_down():163)
[00:05:14.266,000] <dbg> net_l2_ppp: ppp_send_pkt: (rx_q[0]): [LCP/0x20207350] Sending 6 bytes pkt 0x20273390 (options len 0)
[00:05:14.266,000] <dbg> net_if: net_if_tx: (ppp_tx): Processing (pkt 0x20273390, prio 1) network packet iface 0x202030e8/2
[00:05:14.266,000] <dbg> net_l2_ppp: net_pkt_hexdump: send L2
[00:05:14.266,000] <dbg> net_l2_ppp: net_pkt_hexdump: 0x20273390
                                     c0 21 06 03 00 04                                |.!....           
[00:05:14.266,000] <dbg> net_ppp: net_pkt_hexdump: send ppp
[00:05:14.266,000] <dbg> net_ppp: net_pkt_hexdump: 0x20273390
                                  c0 21 06 03 00 04                                |.!....           
[00:05:14.267,000] <dbg> conn_mgr: conn_mgr_ipv4_events_handler: (net_mgmt): IPv4 event 0xe0040002 received on iface 2 (0x202030e8)
[00:05:14.267,000] <dbg> conn_mgr: conn_mgr_ipv4_events_handler: (net_mgmt): Iface index 1
[00:05:14.268,000] <dbg> net_ppp: ppp_input_byte: Address byte (0xff) start
[00:05:14.268,000] <dbg> net_ppp: ppp_change_state: (ppp_workq): [0x202076f8] state ADDRESS (1) => DATA (2)
[00:05:14.273,000] <dbg> net_ppp: ppp_input_byte: End of pkt (0x7e)
[00:05:14.273,000] <dbg> net_ppp: ppp_change_state: (ppp_workq): [0x202076f8] state DATA (2) => ADDRESS (1)
[00:05:14.273,000] <dbg> net_ppp: net_pkt_hexdump: recv ppp
[00:05:14.273,000] <dbg> net_ppp: net_pkt_hexdump: 0x20274270
                                  ff 03 c0 21 01 04 00 18  01 04 05 94 02 06 00 00 |...!.... ........
                                  00 00 05 06 50 cb e4 1e  07 02 08 02 7a 5c       |....P... ....z\  
[00:05:14.273,000] <dbg> net_ppp: ppp_consume_ringbuf: Ringbuf 0x20207f1c is empty!
[00:05:14.273,000] <dbg> net_l2_ppp: net_pkt_hexdump: recv L2
[00:05:14.273,000] <dbg> net_l2_ppp: net_pkt_hexdump: 0x20274270
                                     c0 21 01 04 00 18 01 04  05 94 02 06 00 00 00 00 |.!...... ........
                                     05 06 50 cb e4 1e 07 02  08 02                   |..P..... ..      
[00:05:14.273,000] <dbg> net_l2_ppp: ppp_fsm_input: (rx_q[0]): [LCP/0x20207350] LCP Configure-Req (1) id 4 payload len 20
[00:05:14.273,000] <dbg> net_l2_ppp: fsm_recv_configure_req: (rx_q[0]): [LCP/0x20207350] Current state STOPPING (5)
[00:05:15.267,000] <dbg> net_ppp: ppp_consume_ringbuf: Ringbuf 0x20207f1c is empty!
[00:05:15.267,000] <dbg> net_ppp: ppp_input_byte: Address byte (0xff) start
[00:05:15.267,000] <dbg> net_ppp: ppp_change_state: (ppp_workq): [0x202076f8] state ADDRESS (1) => DATA (2)
[00:05:15.269,000] <dbg> net_ppp: ppp_input_byte: End of pkt (0x7e)
[00:05:15.270,000] <dbg> net_ppp: ppp_change_state: (ppp_workq): [0x202076f8] state DATA (2) => ADDRESS (1)
[00:05:15.270,000] <dbg> net_ppp: net_pkt_hexdump: recv ppp
[00:05:15.270,000] <dbg> net_ppp: net_pkt_hexdump: 0x2027455c
                                  ff 03 c0 21 05 05 00 10  55 73 65 72 20 72 65 71 |...!.... User req
                                  75 65 73 74 94 c3                                |uest..           
[00:05:15.270,000] <dbg> net_ppp: ppp_consume_ringbuf: Ringbuf 0x20207f1c is empty!
[00:05:15.270,000] <dbg> net_l2_ppp: net_pkt_hexdump: recv L2
[00:05:15.270,000] <dbg> net_l2_ppp: net_pkt_hexdump: 0x2027455c
                                     c0 21 05 05 00 10 55 73  65 72 20 72 65 71 75 65 |.!....Us er reque
                                     73 74                                            |st               
[00:05:15.270,000] <dbg> net_l2_ppp: ppp_fsm_input: (rx_q[0]): [LCP/0x20207350] LCP Terminate-Req (5) id 5 payload len 12
[00:05:15.270,000] <dbg> net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Current state STOPPING (5)
[00:05:15.270,000] <dbg> net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Invalid state STOPPING (5)
[00:05:15.277,000] <dbg> net_ppp: ppp_input_byte: Invalid (0x0d) byte, expecting Address
[00:05:15.277,000] <dbg> net_ppp: ppp_change_state: (ppp_workq): [0x202076f8] state ADDRESS (1) => START (0)
[00:05:16.538,000] <dbg> net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x20273390, prio 0) network packet iface 0x20203068/1
[00:05:16.538,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x2027455c, st=0, user_data=(nil)
[00:05:16.750,000] <dbg> net_if: net_if_tx: (tcp_work): Processing (pkt 0x202733d4, prio 0) network packet iface 0x20203068/1
[00:05:19.266,000] <dbg> net_l2_ppp: ppp_fsm_timeout: (sysworkq): [LCP/0x20207350] Current state STOPPING (5)
[00:05:19.266,000] <dbg> net_l2_ppp: ppp_change_state_debug: (sysworkq): [LCP/0x20207350] state STOPPING (5) => STOPPED (3) (ppp_fsm_timeout():136)
[00:05:19.266,000] <dbg> net_l2_ppp: ppp_change_phase_debug: (sysworkq): [0x20207318] phase ESTABLISH (1) => DEAD (0) (ppp_link_terminated():119)
[00:05:19.266,000] <dbg> net_l2_ppp: ppp_link_terminated: (sysworkq): [0x20207318] Link terminated
[00:05:19.391,000] <dbg> net_if: net_if_down: (seModemDriverAppRun): iface 0x202030e8
[00:05:19.391,000] <dbg> net_if: net_if_tx: (seModemDriverAppRun): Processing (pkt 0x20273390, prio 1) network packet iface 0x202030e8/2
[00:05:19.391,000] <dbg> net_l2_ppp: net_pkt_hexdump: send L2
[00:05:19.391,000] <dbg> net_l2_ppp: net_pkt_hexdump: 0x20273390
                                     46 00 00 20 00 00 00 00  01 02 44 d6 00 00 00 00 |F.. .... ..D.....
                                     e0 00 00 02 94 04 00 00  17 00 08 fe e0 00 00 01 |........ ........
[00:05:19.391,000] <dbg> net_if: net_if_ipv4_maddr_rm: (seModemDriverAppRun): interface 0x202030e8 address 224.0.0.1 removed
[00:05:19.392,000] <dbg> conn_mgr: conn_mgr_ipv4_events_handler: (net_mgmt): IPv4 event 0xe004000b received on iface 2 (0x202030e8)
[00:05:19.392,000] <dbg> net_if: update_operational_state: (seModemDriverAppRun): iface 0x202030e8, oper state DOWN admin UP carrier OFF dormant OFF
[00:05:19.392,000] <dbg> conn_mgr: conn_mgr_iface_events_handler: (net_mgmt): Iface event 0xd0010001 received on iface 2 (0x202030e8)
[00:05:19.392,000] <dbg> conn_mgr: conn_mgr_iface_events_handler: (net_mgmt): Iface index 1
[00:05:19.392,000] <dbg> conn_mgr: conn_mgr_iface_events_handler: (net_mgmt): Iface event 0xd0010003 received on iface 2 (0x202030e8)
[00:05:19.392,000] <dbg> conn_mgr: conn_mgr_iface_events_handler: (net_mgmt): Iface index 1
[00:05:19.392,000] <dbg> net_if: update_operational_state: (seModemDriverAppRun): iface 0x202030e8, oper state DOWN admin DOWN carrier OFF dormant OFF
[00:05:19.538,000] <dbg> net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x20273390, prio 0) network packet iface 0x20203068/1
[00:05:19.539,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x202742b4, st=0, user_data=(nil)
[00:05:20.662,000] <dbg> net_sock: zsock_socket_internal: (seModemTestAppRun): socket: ctx=0x20205f40, fd=23
[00:05:20.663,000] <dbg> net_ctx: net_context_unref: (seModemTestAppRun): Context 0x20205f40 released
[00:05:22.538,000] <dbg> net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x20273390, prio 0) network packet iface 0x20203068/1
[00:05:22.538,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x202740d8, st=0, user_data=(nil)
[00:05:25.100,000] <dbg> net_sock: zsock_socket_internal: (seModemTestAppRun): socket: ctx=0x20205f40, fd=23
[00:05:25.101,000] <dbg> net_ctx: net_context_unref: (seModemTestAppRun): Context 0x20205f40 released
[00:05:25.537,000] <dbg> net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x20273390, prio 0) network packet iface 0x20203068/1
[00:05:25.537,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x2027411c, st=0, user_data=(nil)
[00:05:25.782,000] <dbg> net_if: net_if_tx: (tcp_work): Processing (pkt 0x202733d4, prio 0) network packet iface 0x20203068/1
[00:05:27.109,000] modem                   DEBUG: --> AT+CEREG?
[00:05:28.538,000] <dbg> net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x20273390, prio 0) network packet iface 0x20203068/1
[00:05:28.538,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x202740d8, st=0, user_data=(nil)
[00:05:29.110,000] modem                   DEBUG: <--  
[00:05:29.161,000] modem                   DEBUG: CEREG Command failed.
[00:05:30.102,000] <dbg> net_sock: zsock_socket_internal: (seModemTestAppRun): socket: ctx=0x20205f40, fd=23
[00:05:30.102,000] <dbg> net_ctx: net_context_unref: (seModemTestAppRun): Context 0x20205f40 released
[00:05:30.172,000] modem                   DEBUG: --> AT+CEREG?
[00:05:30.191,000] modem                   DEBUG: <-- AT+CEREG?  /+CEREG: 0,0 / /OK /
[00:05:31.252,000] modem                   DEBUG: --> AT+CEREG?
[00:05:31.257,000] modem                   DEBUG: <-- AT+CEREG?  /+CEREG: 0,0 / /OK /
[00:05:31.537,000] <dbg> net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x20273390, prio 0) network packet iface 0x20203068/1
[00:05:31.538,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x202740d8, st=0, user_data=(nil)
[00:05:32.308,000] modem                   DEBUG: Buffer was not empty before command.
[00:05:32.308,000] modem                   DEBUG: <--  /RDY / /+CFUN: 1 / /+CPIN: READY / /+QUSIM: 1 /
[00:05:32.310,000] modem                   DEBUG: --> AT+CEREG?
[00:05:32.316,000] modem                   DEBUG: <-- AT+CEREG?  /+CEREG: 0,2 / /OK /
[00:05:33.367,000] modem                   DEBUG: Buffer was not empty before command.
[00:05:33.368,000] modem                   DEBUG: <--  /+QIND: SMS DONE /
[00:05:33.369,000] modem                   DEBUG: --> AT+CEREG?
[00:05:33.378,000] modem                   DEBUG: <-- AT+CEREG?  /+CEREG: 0,5 / /OK /
[00:05:33.428,000] modem                    INFO: Registered, roaming.
[00:05:33.439,000] modem                   DEBUG: --> AT+QENG="servingcell"
[00:05:33.459,000] modem                   DEBUG: <-- AT+QENG="servingcell"  /+QENG: "servingcell","CONNECT","eMTC","FDD",262,01,19C3804,214,1444,3,3,3,5D3,-99,-13,-84,23, / /OK /
[00:05:34.535,000] <dbg> net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x20273390, prio 0) network packet iface 0x20203068/1
[00:05:34.535,000] <dbg> net_sock: zsock_received_cb: (rx_q[0]): ctx=0x20205e88, pkt=0x202740d8, st=0, user_data=(nil)
[00:05:35.103,000] <dbg> net_sock: zsock_socket_internal: (seModemTestAppRun): socket: ctx=0x20205f40, fd=23
[00:05:35.104,000] <dbg> net_ctx: net_context_unref: (seModemTestAppRun): Context 0x20205f40 released
[00:05:36.510,000] modem                   DEBUG: Buffer was not empty before command.
[00:05:36.511,000] modem                   DEBUG: <--  /APP RDY /
[00:05:36.511,000] modem                   DEBUG: --> AT+QCSQ
[00:05:36.519,000] modem                   DEBUG: <-- AT+QCSQ  /+QCSQ: "eMTC",-87,-100,210,-12 / /OK /
[00:05:36.580,000] modem                   DEBUG: --> AT+CGACT?
[00:05:36.589,000] modem                   DEBUG: <-- AT+CGACT?  /+CGACT: 1,1 /+CGACT: 2,0 /+CGACT: 3,0 /+CGACT: 4,0 /+CGACT: 5,0 / /OK /
[00:05:36.650,000] modem                   DEBUG: --> AT+CGDCONT=1,"IP","iot.1nce.net"
[00:05:36.657,000] modem                   DEBUG: <-- AT+CGDCONT=1,"IP","iot.1nce.net"  /OK /
[00:05:36.718,000] modem                   DEBUG: --> AT+CGPADDR
[00:05:36.729,000] modem                   DEBUG: <-- AT+CGPADDR  /+CGPADDR: 1,"100.84.56.8" / /+CGPADDR: 2 / /+CGPADDR: 3 / /+CGPADDR: 4 / /+CGPADDR: 5 / /OK /
[00:05:36.790,000] modem                   DEBUG: --> ATD*99#;
[00:05:36.794,000] modem                   DEBUG: <-- ATD*99#;  /CONNECT / /OK /
[00:05:36.844,000] <dbg> net_if: net_if_up: (seModemDriverAppRun): iface 0x202030e8
[00:05:36.844,000] <dbg> net_ppp: ppp_start: Initializing PPP to use uart@4018c000
[00:05:36.844,000] <dbg> net_if: update_operational_state: (seModemDriverAppRun): iface 0x202030e8, oper state DOWN admin DOWN carrier ON dormant OFF
[00:05:36.845,000] <dbg> net_if: net_if_ipv4_maddr_add: (seModemDriverAppRun): interface 0x202030e8 address 224.0.0.1 added
[00:05:36.845,000] <dbg> conn_mgr: conn_mgr_ipv4_events_handler: (net_mgmt): IPv4 event 0xe0040003 received on iface 2 (0x202030e8)
[00:05:36.845,000] <dbg> conn_mgr: conn_mgr_ipv4_events_handler: (net_mgmt): Iface index 1
[00:05:36.845,000] <dbg> net_if: update_operational_state: (seModemDriverAppRun): iface 0x202030e8, oper state UP admin UP carrier ON dormant OFF
[00:05:36.845,000] <dbg> conn_mgr: conn_mgr_iface_events_handler: (net_mgmt): Iface event 0xd0010002 received on iface 2 (0x202030e8)
[00:05:36.845,000] <dbg> conn_mgr: conn_mgr_iface_events_handler: (net_mgmt): Iface index 1
[00:05:36.845,000] <dbg> net_l2_ppp: ppp_change_phase_debug: (net_mgmt): [0x20207318] phase DEAD (0) => ESTABLISH (1) (ppp_open_async():202)
[00:05:36.845,000] <dbg> net_l2_ppp: ppp_open_async: (net_mgmt): Starting LCP
[00:05:36.845,000] <dbg> net_l2_ppp: ppp_fsm_lower_up: (net_mgmt): [LCP/0x20207350] Current state STOPPED (3)
[00:05:36.845,000] <dbg> net_l2_ppp: ppp_fsm_lower_up: (net_mgmt): [LCP/0x20207350] Invalid state STOPPED (3)
[00:05:36.845,000] <dbg> net_l2_ppp: ppp_fsm_open: (net_mgmt): [LCP/0x20207350] Current state STOPPED (3)
uart:~$ net ppp
PPP phase           : ESTABLISH (1)
LCP state           : STOPPED (3)
LCP retransmits     : 0
LCP NACK loops      : 0
LCP NACKs recv      : 0
LCP current id      : 1
LCP ACK received    : yes
IPCP state          : INITIAL (0)
IPCP retransmits    : 10
IPCP NACK loops     : 0
IPCP NACKs recv     : 0
IPCP current id     : 2
IPCP ACK received   : yes
IPv6CP state        : INITIAL (0)
IPv6CP retransmits  : 10
IPv6CP NACK loops   : 0
IPv6CP NACKs recv   : 0
IPv6CP current id   : 1
IPv6CP ACK received : yes
PAP state           : INITIAL (0)
PAP retransmits     : 0
PAP NACK loops      : 0
PAP NACKs recv      : 0
PAP current id      : 0
PAP ACK received    : no

Environment (please complete the following information):

rlubos commented 2 months ago

One thing that strucks me is that the driver shouldn't really take control over the network interface admin state (I. .e it should not use net_if_up()/net_if_down()), but rather control CARRIER/DORMANT flags (net_if_carrier/dormant_on/off()), see https://docs.zephyrproject.org/latest/connectivity/networking/api/net_if.html#network-interface-state-management.

Nonetheless, I'm not sure if it's correct that on net_if_down() we skip LCP state transitions if PPP is in DEAD phase: https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/l2/ppp/ppp_l2.c#L258 That's why LCP ends up stuck in STOPPED state. @bjarki-andreasen @tomi-font You're more experienced with PPP interfaces, what's your view on this?

bjarki-andreasen commented 2 months ago

It seems you are cutting the carrier right after calling net_if_down() which can cause the network interface to get stuck, as it tries to terminate the link properly. Your driver should be calling net_if_carrier_off() when the data connection is closed, which immediately resets the internal state machine as attempting to close down properly is not possible :) net_if_up() should be used by the application to signal that the interface should be up once it is possible, which is after your modem has registered to a network and the PPP link is up, at which point your driver should call net_if_carrier_on().

ljdevdev commented 2 months ago

Thank you very much for your swift reply :) I'll check that that's working and then close the issue.

ljdevdev commented 1 month ago

It seems I was a bit hasty with closing the issue. @bjarki-andreasen , I tried what you suggested (actually multiple combinations, since I wasn't exactly sure about the necessary order and timing between the calls).

My current setup is as follows:

  1. Set up mobile link
  2. Call net_if_up() (otherwise the uart callback is not set and no PPP responses are received by the driver)
  3. Call net_if_carrier_on()
  4. IP Communication + Link failure
  5. Call net_if_carrier_off()
  6. Reconnect
  7. go to (2.)

This produces the exact same behavior as before (see log below)

Now I dug a bit into the code and found that net_if_carrier_on(), as well as the functions for setting up the PPP uart are called from inside net_if_up:

From ppp_start (https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/net/ppp.c#L1098)

        ppp_async_uart_rx_enable(context);
#else
        uart_irq_rx_disable(context->dev);
        uart_irq_tx_disable(context->dev);
        ppp_uart_flush(context->dev);
        uart_irq_callback_user_data_set(context->dev, ppp_uart_isr,
                        context);
        uart_irq_rx_enable(context->dev);
#endif
    }
#endif /* !CONFIG_NET_TEST */

    net_if_carrier_on(context->iface);
    return 0;

This is called in ppp_up (https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/l2/ppp/ppp_l2.c#L211) which is called in ppp_enable(https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/l2/ppp/ppp_l2.c#L293) which is called in net_if_up.

So it is clear, that net_if_up must be called after the mobile link was set up and an additional net_if_carrier_on call seems redundant (in fact it works fine without an additional call to that function). This may not be the canonic way, but it seems to be the way the driver is built.

As for the call to net_if_carrier_off(), it does not change the state of the LCP fsm and therefore I run into the same problem again.

Log:

[00:05:20.994,000] net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Current state STOPPING (5) [00:05:20.994,000] net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Invalid state STOPPING (5) [00:05:21.001,000] net_ppp: ppp_input_byte: Invalid (0x0d) byte, expecting Address [00:05:21.001,000] net_ppp: ppp_change_state: (ppp_workq): [0x202076f8] state ADDRESS (1) => START (0) [00:05:24.990,000] net_l2_ppp: ppp_fsm_timeout: (sysworkq): [LCP/0x20207350] Current state STOPPING (5) [00:05:24.990,000] net_l2_ppp: ppp_change_state_debug: (sysworkq): [LCP/0x20207350] state STOPPING (5) => STOPPED (3) (ppp_fsm_timeout():136) [00:05:24.990,000] net_l2_ppp: ppp_change_phase_debug: (sysworkq): [0x20207318] phase ESTABLISH (1) => DEAD (0) (ppp_link_terminated():117) [00:05:24.990,000] net_l2_ppp: ppp_link_terminated: (sysworkq): [0x20207318] Link terminated [00:05:25.036,000] net_if: update_operational_state: (seModemDriverAppRun): iface 0x202030e8, oper state DOWN admin UP carrier OFF dormant OFF [00:05:29.286,000] net_if: net_if_ipv6_prefix_set_timer: (rx_q[0]): Prefix lifetime 9250560 sec [00:05:29.287,000] net_if: net_if_ipv6_addr_update_lifetime: (rx_q[0]): Updating expire time of fd26:52ff:a04d:1:4:9fff:fe91:15b1 by 9250560 secs [00:05:29.287,000] net_if: net_if_ipv6_prefix_set_timer: (rx_q[0]): Prefix lifetime 9250560 sec [00:05:29.287,000] net_if: net_if_ipv6_router_update_lifetime: (rx_q[0]): Updating expire time of fe80::21d:aaff:fef7:640 by 1800 secs [00:05:29.287,000] net_if: net_if_stop_rs: (rx_q[0]): Stopping ND/RS for iface 0x20203068 [00:05:29.287,000] net_if: address_lifetime_timeout: (sysworkq): Waiting for 2147483647 ms [00:05:32.756,000] modem DEBUG: --> AT+CEREG? [00:05:34.757,000] modem DEBUG: <--
[00:05:34.808,000] modem DEBUG: CEREG Command failed. [00:05:35.819,000] modem DEBUG: --> AT+CEREG? [00:05:35.838,000] modem DEBUG: <-- AT+CEREG? /+CEREG: 0,0 / /OK / [00:05:36.878,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:36.879,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:36.879,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:36.899,000] modem DEBUG: --> AT+CEREG? [00:05:36.905,000] modem DEBUG: <-- AT+CEREG? /+CEREG: 0,0 / /OK / [00:05:37.406,000] net_if: net_if_ipv6_prefix_set_timer: (rx_q[0]): Prefix lifetime 9250560 sec [00:05:37.406,000] net_if: net_if_ipv6_addr_update_lifetime: (rx_q[0]): Updating expire time of fd26:52ff:a04d:1:4:9fff:fe91:15b1 by 9250560 secs [00:05:37.406,000] net_if: net_if_ipv6_prefix_set_timer: (rx_q[0]): Prefix lifetime 9250560 sec [00:05:37.406,000] net_if: net_if_ipv6_router_update_lifetime: (rx_q[0]): Updating expire time of fe80::21d:aaff:fef7:640 by 1800 secs [00:05:37.406,000] net_if: net_if_stop_rs: (rx_q[0]): Stopping ND/RS for iface 0x20203068 [00:05:37.407,000] net_if: address_lifetime_timeout: (sysworkq): Waiting for 2147483647 ms [00:05:37.955,000] modem DEBUG: Buffer was not empty before command. [00:05:37.956,000] modem DEBUG: <-- /RDY / /+CFUN: 1 / /+CPIN: READY / /+QUSIM: 1 / [00:05:37.957,000] modem DEBUG: --> AT+CEREG? [00:05:37.963,000] modem DEBUG: <-- AT+CEREG? /+CEREG: 0,2 / /OK / [00:05:39.013,000] modem DEBUG: Buffer was not empty before command. [00:05:39.014,000] modem DEBUG: <-- /+QIND: SMS DONE / [00:05:39.015,000] modem DEBUG: --> AT+CEREG? [00:05:39.410,000] modem DEBUG: <-- AT+CEREG? /+CEREG: 0,2 / /OK / /APP RDY / [00:05:39.879,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:39.879,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:39.880,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:40.471,000] modem DEBUG: --> AT+CEREG? [00:05:40.478,000] modem DEBUG: <-- AT+CEREG? /+CEREG: 0,5 / /OK / [00:05:40.528,000] modem INFO: Registered, roaming. [00:05:40.540,000] modem DEBUG: --> AT+QENG="servingcell" [00:05:40.566,000] modem DEBUG: <-- AT+QENG="servingcell" /+QENG: "servingcell","CONNECT","eMTC","FDD",262,01,19C3801,268,1300,3,5,5,5D3,-93,-14,-79,21, / /OK / [00:05:40.924,000] net_if: net_if_ipv6_prefix_set_timer: (rx_q[0]): Prefix lifetime 9250560 sec [00:05:40.924,000] net_if: net_if_ipv6_addr_update_lifetime: (rx_q[0]): Updating expire time of fd26:52ff:a04d:1:4:9fff:fe91:15b1 by 9250560 secs [00:05:40.924,000] net_if: net_if_ipv6_prefix_set_timer: (rx_q[0]): Prefix lifetime 9250560 sec [00:05:40.924,000] net_if: net_if_ipv6_router_update_lifetime: (rx_q[0]): Updating expire time of fe80::21d:aaff:fef7:640 by 1800 secs [00:05:40.924,000] net_if: net_if_stop_rs: (rx_q[0]): Stopping ND/RS for iface 0x20203068 [00:05:40.924,000] net_if: address_lifetime_timeout: (sysworkq): Waiting for 2147483647 ms [00:05:42.906,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:42.906,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:42.907,000] net_if: net_if_tx: (rx_q[0]): Processing (pkt 0x202733d4, prio 1) network packet iface 0x20203068/1 [00:05:43.627,000] modem DEBUG: --> AT+QCSQ [00:05:43.634,000] modem DEBUG: <-- AT+QCSQ /+QCSQ: "eMTC",-86,-94,195,-8 / /OK / [00:05:43.695,000] modem DEBUG: --> AT+CGACT? [00:05:43.704,000] modem DEBUG: <-- AT+CGACT? /+CGACT: 1,1 /+CGACT: 2,0 /+CGACT: 3,0 /+CGACT: 4,0 /+CGACT: 5,0 / /OK / [00:05:43.766,000] modem DEBUG: --> AT+CGDCONT=1,"IP","iot.1nce.net" [00:05:43.773,000] modem DEBUG: <-- AT+CGDCONT=1,"IP","iot.1nce.net" /OK / [00:05:43.835,000] modem DEBUG: --> AT+CGPADDR [00:05:43.845,000] modem DEBUG: <-- AT+CGPADDR /+CGPADDR: 1,"100.84.56.8" / /+CGPADDR: 2 / /+CGPADDR: 3 / /+CGPADDR: 4 / /+CGPADDR: 5 / /OK / [00:05:43.906,000] modem DEBUG: --> ATD99#; [00:05:43.910,000] modem DEBUG: <-- ATD99#; /CONNECT / /OK / [00:05:43.961,000] net_if: net_if_up: (seModemDriverAppRun): iface 0x202030e8 [00:05:43.961,000] net_if: update_operational_state: (seModemDriverAppRun): iface 0x202030e8, oper state UP admin UP carrier ON dormant OFF [00:05:43.961,000] net_l2_ppp: ppp_change_phase_debug: (net_mgmt): [0x20207318] phase DEAD (0) => ESTABLISH (1) (ppp_open_async():202) [00:05:43.961,000] net_l2_ppp: ppp_open_async: (net_mgmt): Starting LCP [00:05:43.961,000] net_l2_ppp: ppp_fsm_lower_up: (net_mgmt): [LCP/0x20207350] Current state STOPPED (3) [00:05:43.961,000] net_l2_ppp: ppp_fsm_lower_up: (net_mgmt): [LCP/0x20207350] Invalid state STOPPED (3) [00:05:43.961,000] net_l2_ppp: ppp_fsm_open: (net_mgmt): [LCP/0x20207350] Current state STOPPED (3) [00:05:43.961,000] modem DEBUG: PPP interface up. Waiting for IP-address. [00:05:43.962,000] modem INFO: PPP device IP address: 100.84.56.8 [00:05:43.963,000] modem-performance-test WARNING: Socket bind failed

bjarki-andreasen commented 1 month ago

@ljdevdev This is a larger issue, where I am very much in opposition to the way ppp.c works, IMO misusing the carrier... See

The modem subsystem, with modem_cellular.c uses the interface and carrier appropriately, you may want to look into using it, or at least adapting your driver to behave similarly :)

ljdevdev commented 1 month ago

@bjarki-andreasen I am fairly certain, that all of this has nothing to do with the way I set up the PPP conection (carrier vs net_if vs modem_cellular.c etc).

If I look at the actual LCP communication happening here I get the following:

The modem sends a terminate request (05)

[00:05:18.371,000] <dbg> net_l2_ppp: net_pkt_hexdump: 0x2027400c
                                     c0 21 05 03 00 20 4e 6f  20 6e 65 74 77 6f 72 6b |.!... No  network
                                     20 70 72 6f 74 6f 63 6f  6c 73 20 72 75 6e 6e 69 | protoco ls runni
                                     6e 67                                            |ng       
[00:05:18.371,000] <dbg> net_l2_ppp: ppp_fsm_input: (rx_q[0]): [LCP/0x20207350] LCP Terminate-Req (5) id 3 payload len 28
[00:05:18.372,000] <dbg> net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Current state OPENED (9)
[00:05:18.372,000] <dbg> net_l2_ppp: fsm_recv_terminate_req: (rx_q[0]): [LCP/0x20207350] Terminated by peer (No network protocols running)
[00:05:18.372,000] <dbg> net_l2_ppp: ppp_change_state_debug: (rx_q[0]): [LCP/0x20207350] state OPENED (9) => STOPPING (5) (fsm_recv_terminate_req():877)

The ppp driver sends a terminate ack (06):

[00:05:18.373,000] <dbg> net_l2_ppp: net_pkt_hexdump: send L2
[00:05:18.373,000] <dbg> net_l2_ppp: net_pkt_hexdump: 0x2027345c
                                     c0 21 06 03 00 04                                |.!....    

The lcp fsm then runs into a timeout, which puts it into mode STOPPED

[00:05:23.372,000] <dbg> net_l2_ppp: ppp_fsm_timeout: (sysworkq): [LCP/0x20207350] Current state STOPPING (5)
[00:05:23.373,000] <dbg> net_l2_ppp: ppp_change_state_debug: (sysworkq): [LCP/0x20207350] state STOPPING (5) => STOPPED (3) (ppp_fsm_timeout():136)

Now if I call net_if_carrier_on() in this state, that may or may not be fine. This is what modem_cellular.c does, and if that works, it is because it uses cmux (I think). I don't use cmux (yet) and so I need to pass control over the UART back and forth.

In order to do that I can either call net_if_up() and net_if_down(), or ppp_start() and ppp_stop(). So I resorted to calling net_if_down() after link termination and then net_if_up() again. And while that is something a driver should not do, that seems to be something, that an application might want to do.

So if an application asynchronously calls net_if_down() between a terminate request and reestablishment of the link, I think the driver would still freeze.

ljdevdev commented 1 month ago

For the time being I found an ugly hack, to get this working for my driver without dragging a manual patch of zephyr along.

In the long run I think it would be great to have a PPP driver in zephyr that is usable with and without cmux and with and without any given cellular driver.

If there is any interest in pursuing this further I am happy to help, otherwise I think we can close this.

carlescufi commented 1 week ago

Thanks @ljdevdev closing this as you suggest.