openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
3.94k stars 3.46k forks source link

modemmanager: modem in "registered" state, no internet access #14096

Closed bobafetthotmail closed 2 years ago

bobafetthotmail commented 3 years ago

@nickberry17 @aleksander0m

So, after 3-4 hours the modem ends up in a state called "registered" in the Status block below

  --------------------------------
  General  |            dbus path: /org/freedesktop/ModemManager1/Modem/0
           |            device id: a9ad31eabcd05240ad06519dfb896a5f8d7238ed
  --------------------------------
  Hardware |         manufacturer: Sierra Wireless, Incorporated
           |                model: EM7455
           |    firmware revision: SWI9X30C_02.33.03.00
           |         h/w revision: EM7455B
           |            supported: gsm-umts, lte
           |              current: gsm-umts, lte
           |         equipment id: REDACTED
  --------------------------------
  System   |               device: /sys/devices/pci0000:00/0000:00:0f.5/usb1/1-2
           |              drivers: cdc_mbim, qcserial
           |               plugin: sierra
           |         primary port: cdc-wdm0
           |                ports: cdc-wdm0 (mbim), ttyUSB0 (qcdm), ttyUSB1 (gps), 
           |                       ttyUSB2 (at), wwan0 (net)
  --------------------------------
  Status   |       unlock retries: sim-pin2 (3)
           |                state: registered
           |          power state: on
           |          access tech: lte
           |       signal quality: 6% (recent)
  --------------------------------
  Modes    |            supported: allowed: 3g, 4g; preferred: none
           |              current: allowed: 3g, 4g; preferred: none
  --------------------------------
  IP       |            supported: ipv4, ipv6, ipv4v6
  --------------------------------
  3GPP     |             REDACTED
  --------------------------------
  3GPP EPS | ue mode of operation: csps-2
  --------------------------------
  SIM      |            dbus path: /org/freedesktop/ModemManager1/SIM/0
  --------------------------------
  Bearer   |            dbus path: /org/freedesktop/ModemManager1/Bearer/2

and I can't access the Internet anymore.

Restarting the OpenWrt interface, that I called LTE_MM in my config, fixes the issue

ifup LTE_MM

From a similar issue https://github.com/openwrt/packages/issues/11200

@nickberry17 said that I need a hotplug script and provided an example. I took and improved it a bit.

The hotplug script works fine if I call it manually or if the LTE_MM interface is actually torn down by stopping and starting again the NetworkManager service, but it isn't triggered when the modem state changes.

I'm not sure ModemManager can trigger that, I'm not seeing /sbin/hotplug-call anywhere in the scripts here, so I guess the only hotplug events come from the network interface status.

Would you know where can I place such command to trigger an hotplug event when the modem state changes to "registered"? I can run some tests on my own if you can give me some pointers.

For now I'm working around the issue by running a script that polls the modem status every second and will trigger the hotplug script if needed. It's better than sending pings all the time.

aleksander0m commented 3 years ago

This is a limitation of the netifd integration of ModemManager.

If your network interface in netifd was "up" but the modem is "registered", it's because at some point it was connected, but then it disconnected; e.g. a network-initiated disconnection. When this happens, the netifd protocol handler doesn't do anything; it won't tyr to automatically re-ifup the interface (as it was really up already for it). I asked about how to best handle this a while ago and didn't get any response at all I'm afraid.

The reality for a proper use of the ModemManager protocol handler in openwrt is that you require something like a "wwan monitor" program that runs "on top" of netifd, and when that is used, the netifd settings could be set to only manually connect instead of automatically, leaving the choice of when to ifup or ifdown the interface to the wwan monitor application. Under that setup, the wwan monitor could check the state of the modem, and if it goes back to registered, it would ifdown and ifup the network interface again. This kind of monitor is also helpful for additional checks (e.g. actual connectivity tests to the network by pinging some public IP, or e.g. the HTTP connectivity checks that android does, or e.g. testing DNS queries to the network-provided DNS servers. This kind of monitor is also useful to detect when the modem has gone nuts (more usual than rare, believe me) and act accordingly; e.g. triggering some GPIO to shut it down if the integration allows it, or at least trying to cleanly reset it with mmcli --reset....

bobafetthotmail commented 3 years ago

Fair enough. I can only hack around it as I don't know nor want to learn how netifd works.

My "solution" is probably going to be something like that, turning the hotplug script into a daemon script that polls regularly mmcli to see what's up and read stuff like interface name and modem address on its own instead of relying on hard-coded values.

If there is interest in this, I can send it as a PR and someone else can start migrating functionality from the netifd protocol script to the "monitor" script.

bobafetthotmail commented 3 years ago

so, this is the script I've been using for a little less than a week now, and "fixes" the issue, the modem does not get stuck in registered state anymore, while before it would go into the "registered state" after 4-5 hours of uptime.

In my device it is launched by a procd init script (that isn't included here) as I turned this into a local package I could include at compile time for my systems.

It's a fire-and-forget system as it reads interface names and modemmanager modem path automatically from the uci config of the existing interface, and is designed to run with any number of modemmanager interfaces in parallel although I couldn't test more than one since I only have a single modem/SIM/contract.

If the interface is stuck in the "registered" mode it will try ifup for 5 times and if that does not solve the problem it will restart modemmanager service. It's sub-optimal as it would affect other modems as well, but it's good enough for now.

All timeouts have been increased significantly from the original script because my modem reacts too slowly for the original timeouts in the script, and it was going in a "restart loop" where I keep restarting modemmanager service.

My modem does not react to mmcli --reset so the script has no functionality to actually reset the modem. I can (and will) hack together something using a Crelay-supported USB relay system to actually cut the power lines in USB that goes to the modem so it can be power cycled, but this isn't implemented in the script (yet).

At the moment it's doing pings to 1.1.1.1 to test if connection to the Internet is available, which may or may not be cool, but I have no other way to figure out an IP in a dynamic way and I didn't want to spam the openWrt servers. Any ideas welcome.

Is either of you guys @nickberry17 @aleksander0m interested in this? I can send a PR to include this or a subset if it in the modemmanager package.

#!/bin/sh
# babysitter script for ModemManager modems. 
# /usr/lib/modem-monitor-mm/modem-monitor-mm.sh
#
# based off the hotplug script /etc/hotplug.d/iface/30-keepalive_modemmanager from Nicholas Smith <mips171@icloud.com>
#
# substantial additions and changes to make an actual polling daemon, from Alberto Bursi <bobafetthotmail@gmail.com>
#

############# FUNCTIONS

increment_retry() {
current_retry="$(cat /tmp/MM_Monitor_"$modem_iface"_retries)"
echo $(( current_retry + 1 )) > /tmp/MM_Monitor_"$modem_iface"_hotplug_retries
}

check() {
    logger -t INFO "MM_Monitor: Beginning ping test for $bb_modem_iface."
    ping -c3 -s56 -w30 1.1.1.1
    if [ $? -eq 0 ]; then
        return 0
    else
        return 1
    fi
}

check_reconnect() {
    check
    check_result="$?"
    if [ "$check_result" = 1 ] ; then
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface failed ping test. Reconnecting."
        reconnect
    elif [ "$check_result" = 0 ] ; then
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface is connected. 3 successful pings."
    fi
}

reconnect() {
    logger -t INFO "MM_Monitor: Reconnecting modem on $bb_modem_iface now."
    /etc/init.d/modemmanager stop && sleep 2 && /etc/init.d/modemmanager start
    sleep 20
    ifup "$bb_modem_iface"
    sleep 10
    check_reconnect
}

restart_iface() {
    logger -t INFO "MM_Monitor: Restarting $bb_modem_iface interface now."
    ifup "$bb_modem_iface"
    sleep 10
    check_restart_iface
}

check_restart_iface() {
    check
    check_result="$?"
    if  [ "$check_result" = 1 ] && [ "$(cat /tmp/MM_Monitor_"$bb_modem_iface"_retries)" -le 5 ] ; then
        increment_retry
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface failed ping test. Restart interface."
        restart_iface
    elif  [ "$check_result" = 1 ] && [ "$(cat /tmp/MM_Monitor_"$bb_modem_iface"_retries)" -gt 5 ] ; then
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface failed ping test. Max retries, restart modemmanager"
        reconnect
    fi
}

babysit_modem() {
    bb_modem_iface="$1"
    modem_index=$( uci get network."$bb_modem_iface".device )
    echo 0 > /tmp/MM_Monitor_"$bb_modem_iface"_hotplug_retries # number of retries for connection
    /usr/bin/mmcli -m "$modem_index" > /tmp/MM_Monitor_"$bb_modem_iface"_Status
    DISABLED=$(grep -w "state: disabled" /tmp/MM_Monitor_"$bb_modem_iface"_Status)
    SEARCHING=$(grep -w "state: searching" /tmp/MM_Monitor_"$bb_modem_iface"_Status)
    REGISTERED=$(grep -w "state: registered" /tmp/MM_Monitor_"$bb_modem_iface"_Status)
    CONNECTED=$(grep -w "state: connected" /tmp/MM_Monitor_"$bb_modem_iface"_Status)
    IDLE=$(grep -w "state: idle" /tmp/MM_Monitor_"$bb_modem_iface"_Status)

    if [ -n "$DISABLED" ]; then
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface is disabled. Will attempt to initiate connection."
        logger -t DEBUG "$DISABLED"
        reconnect
    fi
#   if [ -n "$CONNECTED" ]; then
#       logger -t INFO "MM_Monitor: Modem on $bb_modem_iface reports as connected. Checking actual connectivity."
#       check_reconnect
#   fi
    if [ -n "$SEARCHING" ]; then
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface is searching."
        logger -t INFO "MM_Monitor: Giving 20 seconds to complete modem setup on $bb_modem_iface."
        sleep 20
        restart_iface
    fi
    if [ -n "$IDLE" ]; then
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface is idle."
        reconnect
    fi
    if [ -n "$REGISTERED" ]; then
        logger -t INFO "MM_Monitor: Modem on $bb_modem_iface is registered."
        restart_iface
    fi
    echo 0 > /tmp/MM_monitor_"$bb_modem_iface"_running 
}

############## MAIN

logger -t INFO "MM_Monitor: modem babysitter script is now running."

#cleaning up flags that might have been left by other instances
modem_interfaces=$( uci show network | grep modemmanager | awk -F'.' '{print $2}' | tr '\n' ' ' )

for modem_iface in $modem_interfaces ; do
    echo 0 > /tmp/MM_monitor_"$modem_iface"_running
done

while [ true ] ; do 

    #reading from uci to get modem interface name and modem address for modemmanager
    modem_interfaces=$( uci show network | grep modemmanager | awk -F'.' '{print $2}' | tr '\n' ' ' )

    for modem_iface in $modem_interfaces ; do
        result=$(cat /tmp/MM_monitor_"$modem_iface"_running)
        if [ "$result" != "1" ] ; then
            echo 1 > /tmp/MM_monitor_"$modem_iface"_running 
            babysit_modem "$modem_iface" 
        fi
    done

    sleep 1
done

logger -t INFO "MM_Monitor: Modem hotplug script shut down."
mips171 commented 3 years ago

Hey @bobafetthotmail, nice work!

Is this script run by cron? I think this could be a viable path to go down for monitoring and reacting to the changing state of the modems in OpenWrt. The only thing I would suggest at this stage is making the ping destination configurable by setting it in UCI. Also, it would nicer if this were a selectable (weak dependency) option in the modemmanager package as users might want the option to implement their own solution.

Reason I suggest that that is because I have been hacking on luci-app-watchcat and watchcat and have modified them to watch/restart and take appropriate modemmanager-specific actions for my mobiledata (or any other) interface. I have not submitted the PR yet, but it should be ready for that soon. Your script could be a standalone package for users who do not want or require the overhead introduced by watchcat.

aleksander0m commented 3 years ago

I'm going to be extremely honest; please don't take my words in a wrong way!

The overall code could be improved in several places (e.g. using --output-key-value in the mmcli commands), including some state transitions that don't make full sense (e.g. running ifup again if in "searching" state), and also considering that you should not queue multiple ifup/ifdown operations (e.g. if there is already one "pending"). But ignoring all that, this code is making a ton of assumptions that are probably valid in your use case: e.g. you apply the logic to all available modems unconditionally, you perform pings to check remote connectivity, you restart connection on N failed ping checks... all that is applicable to your usecase, but it may really not be applicable to all usecases, and if we want to provide something like that in the ModemManager package itself, it would need to be extremely generic, and that means a huge ton of work... E.g. not every connection check can be done with ping checks, some require HTTP or DNS checks... The modem may be locked with a PIN and the PIN is provided in the interface settings, but there is no transition for that supported... There are too many missing things to support a generic implementation :/

Also, I don't see this kind of script inside the ModemManager package itself because the MM package just installs the protocol handler and MM support; if there is a limitation on this setup (e.g. because of how netifd auto support cannot monitor the real modem state), we should try to solve that in netifd or the protocol handler itself. This is not easy, at least if you don't know netifd internals (as I don't), but that is what the MM openwrt package should look for I think....

bobafetthotmail commented 3 years ago

Hey @bobafetthotmail, nice work!

Is this script run by cron?

no it's started as a procd service and runs constantly (note the infinite while loop at the end in the MAIN in the script above)

It's also already packaged in its own package so that I can have postinst and prerem commands that enable it when I install it and clean up after I remove it.

here is the procd script in the /etc/init.d folder

#!/bin/sh /etc/rc.common
USE_PROCD=1
START=95
STOP=01
start_service() {
    procd_open_instance
    procd_set_param command /bin/sh "/usr/lib/modem-monitor-mm/modem-monitor-mm.sh"
    #if process dies sooner than respawn_threshold, it is considered crashed and after 5 retries the service is stopped
    procd_set_param respawn ${respawn_threshold:-3600} ${respawn_timeout:-5} ${respawn_retry:-5}
    procd_close_instance
}

I have been hacking on luci-app-watchcat and watchcat and have modified them to watch/restart and take appropriate modemmanager-specific actions for my mobiledata (or any other) interface. I have not submitted the PR yet, but it should be ready for that soon. Your script could be a standalone package for users who do not want or require the overhead introduced by watchcat.

Why watchcat? Can't this be dealt with in the MM package itself? it's not like you can use most modems without a babysitter anyway.

bobafetthotmail commented 3 years ago

I'm going to be extremely honest; please don't take my words in a wrong way!

It's OK, I'll also be honest in the following.

The overall code could be improved in several places (e.g. using --output-key-value in the mmcli commands),

Eh, that's useful for some things, but shell is dumb (or I am a noob at shell script), I still need to parse the output of that with grep/awk/whatever in a shell script. Can change commands to use that, but I don't see it as a major improvement.

including some state transitions that don't make full sense (e.g. running ifup again if in "searching" state),

yeah that should have been just a 20 second delay, not run restart_iface too.

and also considering that you should not queue multiple ifup/ifdown operations (e.g. if there is already one "pending").

where does it do that? I thought that after the first ifup it waits 10s and runs the ping check and will ifup again only if failed

you perform pings to check remote connectivity

yes I can limit the ping through each interface so that the result is specific for that modem.

ping -I wwan0 1.1.1.1

would run the pings through that interface only (and it seems to work on my system, ping supports this command in OpenWrt)

and I can get what wwan is specific to each modem by mmcli -m "$modem_index" --output-keyvalue | grep '.ports.value' | grep '(net)' | awk '{ print $3 }'

you restart connection on N failed ping checks...

as mentioned I have no way to restart the specific modem without a hardware contraption that cuts the power to the modem, I can only ifup the interface (which is specific to the modem) or restart the whole modemmanager service, affecting everything.

Is there a way to restart the modemmanager connection of a single modem, that is a more than just doing ifup but does not affect other modems?

if we want to provide something like that in the ModemManager package itself, it would need to be extremely generic, and that means a huge ton of work...

I guess it's better to ship a package that is unusable by everybody then

not every connection check can be done with ping checks, some require HTTP or DNS checks...

Why? Also what is a "http check" or a "dns check" in this context? Is there a way I can auto-detect this or it should be a setting?

The modem may be locked with a PIN and the PIN is provided in the interface settings, but there is no transition for that supported...

I don't understand.

I disabled the PIN lock in the SIM I use in my setup and this probably meant I didn't notice the issue you are mentioning.

If you explain I can fix that.

There are too many missing things to support a generic implementation :/

I can add them if there is interest in pursuing the general direction I'm going in.

The script as-is is more of a RFC and a "works for me" than a proper thing, I just won't invest the time into doing a more proper job if the only thing you guys will accept is a fix in netifd or protocol handler.

Also, I don't see this kind of script inside the ModemManager package itself because the MM package just installs the protocol handler and MM support; if there is a limitation on this setup (e.g. because of how netifd auto support cannot monitor the real modem state), we should try to solve that in netifd or the protocol handler itself.

Don't get me wrong, I'm not pushing to merge this specific script, the following is a more general argument/rant about the priority about dealing with the issues that I had to work around with the script. Which I quite frankly didn't expect to find in MM too.

You know better than me that MM without a way to babysit the modem in real time is useless bullshit just as OpenWrt's dumb qmi daemon (and half-baked mbim daemon) are.

I agree on the "we should try to solve that in netifd or the protocol handler itself", with emphasis on TRY. Solving the issue should have higher priority than using netifd, how much time do you want to wait for someone to come and do it "right"? MM has been merged a year ago, and before that you have maintained the openwrt package for a while in your own repo.

For me it's not a huge deal, I've long since been compiling OpenWrt from source so I could add my hacks and workarounds for these kinds of... let's say "purist" decisions by this or that dev, but what about most users?

They will have to buy a standalone modem (I mean a full device with ethernet ports and web interface), is that really better than dealing with the issue in a sub-optimal way?

aleksander0m commented 3 years ago

The overall code could be improved in several places (e.g. using --output-key-value in the mmcli commands),

Eh, that's useful for some things, but shell is dumb (or I am a noob at shell script), I still need to parse the output of that with grep/awk/whatever in a shell script. Can change commands to use that, but I don't see it as a major improvement.

You can reuse the mmcli helpers in the netifd protocol handler; which is a much easier way to use mmcli. E.g.:

$ INCLUDE_ONLY=1 . /lib/netifd/proto/modemmanager.proto
$ FULL_STATUS="$(mmcli -m 1 -K)"
$ modemmanager_get_field "${FULL_STATUS}" "modem.generic.state"
registered

That gives you one single mmcli call where you can "get_field" as many times as you want for the different fields that you may need.

including some state transitions that don't make full sense (e.g. running ifup again if in "searching" state),

yeah that should have been just a 20 second delay, not run restart_iface too.

Not even, that the searching state is temporary, and 20s is a very arbitrary value I'm afraid. The network registration may take a very long time under low signal quality conditions.

and also considering that you should not queue multiple ifup/ifdown operations (e.g. if there is already one "pending").

where does it do that? I thought that after the first ifup it waits 10s and runs the ping check and will ifup again only if failed

With the script monitor that you wrote, netifd doesn't automatically do anything; the config should have been set to auto:false. Then, when you run ifup, you're telling netifd to bring up the connection, and netifd replies quickly to you, it could be way before the connection succeeds. E.g. the connection has a default timeout of 120s right now; that's the amount of time the mmcli command will wait for a reply. During those 120s, you should not run any other ifup/ifdown; and you can control that looking at the ifstatus of the interface, which will tell you whether "pending:true" or not.

you perform pings to check remote connectivity

yes I can limit the ping through each interface so that the result is specific for that modem.

ping -I wwan0 1.1.1.1

would run the pings through that interface only (and it seems to work on my system, ping supports this command in OpenWrt)

I wasn't referring to that. Yes, you can definitely run the pings through a specific interface by default in OpenWRT, that's fine.

and I can get what wwan is specific to each modem by mmcli -m "$modem_index" --output-keyvalue | grep '.ports.value' | grep '(net)' | awk '{ print $3 }'

If you were to do that, you need to look for the "bearer" object in mmcli, and get the "interface" listed there. You shouldn't look at the ports list looking for net interfaces because: (1) there may be more than 1 net interface listed while only 1 is connected and (2) the modem may use PPP for connection and the associated ppp interface is not listed there.

you restart connection on N failed ping checks...

as mentioned I have no way to restart the specific modem without a hardware contraption that cuts the power to the modem, I can only ifup the interface (which is specific to the modem) or restart the whole modemmanager service, affecting everything.

I wasn't questioning that. I was questioning the fact that you're configuring N ping attempts to fail. Why not N ping attempts against one given IP, plus addtitionally M ping attempts to a different IP just in case the first one is not accessible? The fact that you can configure this kind of checks in multiple ways you could think of makes it very hard to maintain such a solution.

Is there a way to restart the modemmanager connection of a single modem, that is a more than just doing ifup but does not affect other modems?

You could also not only disconnect and reconnect, but also additionaly disable the modem and/or put it in low-power mode, although those things are also configurable right now in openwrt IIRC? Anyway, as you said, maybe the setup supports powering off the modem completely and powering it up again, and that may be done with a GPIO, or i2c, or.... quite some options there. Are we going to make all those things configurable for the user as well?

if we want to provide something like that in the ModemManager package itself, it would need to be extremely generic, and that means a huge ton of work...

I guess it's better to ship a package that is unusable by everybody then

Well, sorry for that. I'm not against your script, I'm just saying that maintaining it inside the MM package shouldn't be our target; our target should have been to make it work with netifd properly to handle reconnections at least, and that would solve 95% of the user issues. I don't have anything against keeping the script out of the MM package though.

not every connection check can be done with ping checks, some require HTTP or DNS checks...

Why?

Because sometimes ICMP is blocked by the networks out of your control.

Also what is a "http check" or a "dns check" in this context?

A HTTP check is e.g. what Android phones (and others I assume) do to check connectivity against a server on the Internet. It could be a simple wget/curl call to a remote HTTP server asking for a specific URL, and returning either some expected content in the body (HTTP 200) or just no body (HTTP 204).

A DNS check is e.g. a DNS resolution query performed against a remote DNS server. For our case, this server could be one specified by the user or one of the servers reported by the network when the modem is connected.

Is there a way I can auto-detect this or it should be a setting?

All those things should be settings, and I can think of quite some things that could be configurable.

The modem may be locked with a PIN and the PIN is provided in the interface settings, but there is no transition for that supported...

I don't understand.

I disabled the PIN lock in the SIM I use in my setup and this probably meant I didn't notice the issue you are mentioning.

If you explain I can fix that.

Re-enable the PIN lock, and when the modem is detected you'll see the state transitioning to "locked" in ModemManager. Then it can be unlocked either via mmcli or using the netifd ifup operation directly if the PIN is in the settings.

There are too many missing things to support a generic implementation :/

I can add them if there is interest in pursuing the general direction I'm going in.

That is fine, and if you're happy working on this I totally support you. It's just not a thing to go inside the MM package. Your monitoring script may be well serving other protocol handlers as well, like the ones using uqmi, umbim or other AT protocols. It doesn't need to be specific to ModemManager really, as you're monitoring the "network connection brought up by netifd".

The script as-is is more of a RFC and a "works for me" than a proper thing, I just won't invest the time into doing a more proper job if the only thing you guys will accept is a fix in netifd or protocol handler.

For the MM package, we should fix either netifd of the protocol handler, that is what my opinion is on this. I don't know what @nickberry17 thinks about it. If he's happy maintaining this script and if it's optional, I won't complain :)

Also, I don't see this kind of script inside the ModemManager package itself because the MM package just installs the protocol handler and MM support; if there is a limitation on this setup (e.g. because of how netifd auto support cannot monitor the real modem state), we should try to solve that in netifd or the protocol handler itself.

Don't get me wrong, I'm not pushing to merge this specific script, the following is a more general argument/rant about the priority about dealing with the issues that I had to work around with the script. Which I quite frankly didn't expect to find in MM too.

The priority issue to fix is the fact that MM can detect a disconnection, effectively disconnecting the modem and reporting the disconnection in mmcli; and netifd still thinking the modem is connected, because the protocol handler doesn't "monitor" the real state of the connection reported by MM. If that is fixed, the main integration would be enough for 95% of the users, and the netifd "auto" connection could be used right away.

You know better than me that MM without a way to babysit the modem in real time is useless bullshit just as OpenWrt's dumb qmi daemon (and half-baked mbim daemon) are.

That is not true for uqmi, I believe that with uqmi the setup tries to enable autoconnection of the modem (where the modem is the one responsible for keeping the connection up); in that case, there's not much need for monitoring connection by netifd. I don't know about umbim because there's no autoconnection support in the protocol. In the case of MM, autoconnection is explicitly disabled because MM doesn't expect autoconnected modems yet. You're seeing a need to "babysit" the modem in real time because of the lack of sync between netifd and MM once the modem is connected; as soon as that is solved, everything would be mostly fine for most setups in a generic way. Regarding the "useless bullshit" part... it's my fault, I may have asked for too much honesty this time, let's keep it more politically correct from now on ;)

I agree on the "we should try to solve that in netifd or the protocol handler itself", with emphasis on TRY. Solving the issue should have higher priority than using netifd, how much time do you want to wait for someone to come and do it "right"?

Don't take me wrong; I've been using similar monitoring scripts myself for MM in openwrt for a very long time, and that's been because (1) it's been a quicker way to integrate the stuff right away and (2) because each setup I've worked on had additional monitoring requirements than the default ones.

MM has been merged a year ago, and before that you have maintained the openwrt package for a while in your own repo.

Thanks for reminding me the amount of time I haven't spent checking the netifd integration properly...

For me it's not a huge deal, I've long since been compiling OpenWrt from source so I could add my hacks and workarounds for these kinds of... let's say "purist" decisions by this or that dev, but what about most users?

They will have to buy a standalone modem (I mean a full device with ethernet ports and web interface), is that really better than dealing with the issue in a sub-optimal way?

See, I can buy that reasoning. What I'm afraid of is that the amount of time needed to develop or maintain the sub-optimal script may end up being much more than what's required to fix the netifd integration...

All this said. If you're up to working on this, hey, I'm not going to oppose, and I'm going to definitely help you reviewing and suggesting changes. I still think that it should go to a separate package, though, because it's going to have a ton of config options and maybe its own build options as well (e.g. HTTP checks that require curl/wget could be disabled in build...).

aleksander0m commented 3 years ago

I've opened this Flyspray ticket for the netifd integration thing: https://bugs.openwrt.org/index.php?do=details&task_id=3499

john-tho commented 3 years ago

For discussion: I do not know the components well (ModemManager, netifd, ubus…), but could we do something similar to what pppd does, with ppp-up and ppp-down and break the netifd interface up / down (and ModemManager mmcli parse) functions into their own scripts, separate from the proto setup?

Something along the lines of:

Just one idea, as if ModemManager is detecting the connection lost, we should be able to use that to automatically reconnect?

Cheers

aleksander0m commented 3 years ago

@john-tho the "something" watching dbus is the monitor app we're discussing here; that is what I'm trying to avoid for the generic usecase if possible. A much more integrated solution could be to modify ModemManager so that it runs post-up/post-down scripts itself, in a way that MM is the one running the explicit ifdown of the netifd interface when a network-initiated disconnection is found. That's an option we could take.

bobafetthotmail commented 3 years ago

You can reuse the mmcli helpers in the netifd protocol handler; which is a much easier way to use mmcli. E.g.:

that's nice, but I would only do that if it isn't a separate package, as someone can change the protocol handler and break my script otherwise.

Not even, that the searching state is temporary, and 20s is a very arbitrary value I'm afraid. The network registration may take a very long time under low signal quality conditions.

as I said, it was not supposed to call restart_iface, that's my error. So it's just waiting 20s, then checking status again. If status is searching it will wait 20s again and then check status, and so on.

With the script monitor that you wrote, netifd doesn't automatically do anything; the config should have been set to auto:false. Then, when you run ifup, you're telling netifd to bring up the connection, and netifd replies quickly to you, it could be way before the connection succeeds. E.g. the connection has a default timeout of 120s right now; that's the amount of time the mmcli command will wait for a reply. During those 120s, you should not run any other ifup/ifdown; and you can control that looking at the ifstatus of the interface, which will tell you whether "pending:true" or not.

Wait. I thought netifd was supposed to reply when the connection is up, not just for lulz. I'll have to add a check that netifd has actually brought up the connection then.

I don't know why I should set the config to auto:false and what that does. I thought the only "automatic" thing netifd does is start the interface on boot, loading the config from uci in the Modem/MM. And then we enter undefined land where the modem does whatever and netifd does not care, which is the area that I'm covering with this script.

If you were to do that, you need to look for the "bearer" object in mmcli, and get the "interface" listed there. You shouldn't look at the ports list looking for net interfaces because: (1) there may be more than 1 net interface listed while only 1 is connected and (2) the modem may use PPP for connection and the associated ppp interface is not listed there.

ok, will do that, thanks.

I wasn't questioning that. I was questioning the fact that you're configuring N ping attempts to fail. Why not N ping attempts against one given IP, plus addtitionally M ping attempts to a different IP just in case the first one is not accessible? The fact that you can configure this kind of checks in multiple ways you could think of makes it very hard to maintain such a solution.

I can easily have the script read an infinite list of IPs, each with its own configurable timeout from its very own UCI config file, I'm just not adding bells and whistles like that unless there are decent chances of someone merging it, or I'm preparing my very own mm-babysitter package for submission to package repo.

You could also not only disconnect and reconnect, but also additionaly disable the modem and/or put it in low-power mode, although those things are also configurable right now in openwrt IIRC?

I'm not sure my modem agrees with that, I think I can't disable it or put it in low power, but I need to check this evening.

Anyway, as you said, maybe the setup supports powering off the modem completely and powering it up again, and that may be done with a GPIO, or i2c, or.... quite some options there. Are we going to make all those things configurable for the user as well?

Yeah sure, why not. That's just dumb boring auxiliary stuff, reading options from UCI and deciding what command to do if "restart modem" action is necessary does not sound difficult. More functions for the function god, no problem.

And if someone needs something special for an embedded modem in a OpenWrt device it can be added too. For example I can just hardcode a list of "supported devices" with pre-configured actions for the onboard modem, similar to what the slide-switch package does https://github.com/jefferyto/openwrt-slide-switch

if we want to provide something like that in the ModemManager package itself, it would need to be extremely generic, and that means a huge ton of work...

I guess it's better to ship a package that is unusable by everybody then

Well, sorry for that.

Yeah, I was just pointing out that people installing a package that does not mention any specific limation in the documentation/README have some expectations that it works somewhat reliably and does not require user-specific scripting. At least mention this in the documentation (wiki) or the package description or something.

I don't have anything against keeping the script out of the MM package though.

And even if you would, you can't stop me from doing it, lolololol :-P

Because sometimes ICMP is blocked by the networks out of your control.

I've not encountered a firewall that blocks ICMP traffic towards a public IP, but I guess it's possible.

Also what is a "http check" or a "dns check" in this context?

A HTTP check is e.g. what Android phones (and others I assume) do to check connectivity against a server on the Internet. It could be a simple wget/curl call to a remote HTTP server asking for a specific URL, and returning either some expected content in the body (HTTP 200) or just no body (HTTP 204).

ok, can do that.

A DNS check is e.g. a DNS resolution query performed against a remote DNS server. For our case, this server could be one specified by the user or one of the servers reported by the network when the modem is connected.

ok, i can use nslookup tool for that.

Is there a way I can auto-detect this or it should be a setting?

All those things should be settings, and I can think of quite some things that could be configurable.

Tell me what else you would want as a setting and why. I'm not a networking pro so I might have missed something obvious.

Re-enable the PIN lock, and when the modem is detected you'll see the state transitioning to "locked" in ModemManager. Then it can be unlocked either via mmcli or using the netifd ifup operation directly if the PIN is in the settings.

ok, can add that too.

There are too many missing things to support a generic implementation :/

I can add them if there is interest in pursuing the general direction I'm going in.

That is fine, and if you're happy working on this I totally support you. It's just not a thing to go inside the MM package. Your monitoring script may be well serving other protocol handlers as well, like the ones using uqmi, umbim or other AT protocols. It doesn't need to be specific to ModemManager really, as you're monitoring the "network connection brought up by netifd".

Lol no they are far too broken. Been there, done that a year ago, bought a fully autonomous Huawei LTE modem device instead (and now a Mikrotik modem antenna thing with PoE). I mean not a modem card, a whole device with its own GUI and firmware

Fixing the situations they manage to get themselves into requires giving specific commands to the modem (or forcing reboot, which is bad), it's not as simple as calling "ifup" and letting netifd do it. For example umbim is garbage on both modem cards (can't even connect when they are set to mbim mode and Windows can use them fine) and uqmi reliably locks up one of the two cards (requiring a power cycle) after a day because of some bug that was already in the tracker and now may or may not be fixed but I'm not touching it again. I don't use AT protocols because it would throttle the LTE connection so I don't know.

The priority issue to fix is the fact that MM can detect a disconnection, effectively disconnecting the modem and reporting the disconnection in mmcli; and netifd still thinking the modem is connected, because the protocol handler doesn't "monitor" the real state of the connection reported by MM. If that is fixed, the main integration would be enough for 95% of the users, and the netifd "auto" connection could be used right away.

yeah agreed. If the modem itself is from a decent brand like Sierra Wireless this should be enough.

That is not true for uqmi, I believe that with uqmi the setup tries to enable autoconnection of the modem (where the modem is the one responsible for keeping the connection up);

yeah and it does not work at all on some modems (the other card I had, a Quectel E-25), while this works on the Sierra modem I have, uqmi locks up with it because of the bug.

Regarding the "useless bullshit" part...

As I said, I had already dealt with other daemons in the past, was very disappointed, and I stand beside that statement.

Don't take me wrong; I've been using similar monitoring scripts myself for MM in openwrt for a very long time, and that's been because (1) it's been a quicker way to integrate the stuff right away and (2) because each setup I've worked on had additional monitoring requirements than the default ones.

Yes, but we return to the (imho reasonable) expectations I mentioned above. People installing a package have reasonable expectations that (1) the package works reliably for the most common usecases (2) the description or documentation ( https://github.com/openwrt/packages/blob/master/net/modemmanager/README.md and https://openwrt.org/docs/guide-user/network/wan/wwan/modemmanager ) mentions known limitations and/or examples of ways to deal with them.

Or, in other words, I would be much less pissed off if somebody told me of this limitation BEFORE I wasted two evenings to troubleshoot it and then trobuleshoot why the suggested solution of the other similar issue in the issue list (mentioned above) also fails (nothing calls hotplug scripts if the modem does whatever, so a hotplug script is pointless).

See, I can buy that reasoning. What I'm afraid of is that the amount of time needed to develop or maintain the sub-optimal script may end up being much more than what's required to fix the netifd integration...

Oh I'm sure it is, But I'm also confident that nobody actually knows or cares enough about fixing the netifd implementation for this usecase or the protocol or the script, just because it is a core OpenWrt thing (i.e. only core OpenWrt devs know anything about it) and documentation on it is like 4 sentences https://openwrt.org/docs/techref/netifd

While making a sub-optimal script can be done in a weekend by anyone with a basic understanding and access to StackExchange/Overflow and then contributed and fixed by anyone else.

All this said. If you're up to working on this, hey, I'm not going to oppose, and I'm going to definitely help you reviewing and suggesting changes.

thanks

I still think that it should go to a separate package, though, because it's going to have a ton of config options and maybe its own build options as well (e.g. HTTP checks that require curl/wget could be disabled in build...).

imho most of that is optional stuff that should at best go into documentation, comments and error messages. "Default setup" is just pinging an IP as that is the lightest setup and would work for most people.

The script only needs to check if these dependencies are indeed there and print errors on the log otherwise (and obviously not break).

bobafetthotmail commented 3 years ago

@john-tho the "something" watching dbus is the monitor app we're discussing here; that is what I'm trying to avoid for the generic usecase if possible. A much more integrated solution could be to modify ModemManager so that it runs post-up/post-down scripts itself, in a way that MM is the one running the explicit ifdown of the netifd interface when a network-initiated disconnection is found. That's an option we could take.

eh. It would surely solve the issue and move all the burden on you, I'm not complaining.

But I personally find distasteful to have each daemon call stuff on its own as that requires it to be running as root or at the very least with shell access, which is a problem for the near future in OpenWrt as a couple guys are working on the hardening of services (services are moved to non-root users with limited privileges).

push-gh commented 2 years ago

I'm having the same issue (It should be, according to this discussion anyway). Instead of pinging external destinations, What I did was create a simple script and added it to /etc/rc.local. It reads a unique log entry corresponding to this status change and restart the network interface. So far it worked for me. I think it also saves CPU cycles as it will in block state until a log entry is processed.

#!/bin/sh

interface=$1
string_to_match='state changed (searching -> registered)'

logread -f | while read line; do
    if [[ "$line" == *"$string_to_match"* ]]; then
        logger "matching string found.restarting the interface $interface"
    ifup $interface
    logger "interface $interface restarted"
    fi
done 
arkanoid87 commented 2 years ago

Well this is a serious issue. I've just found this in OpenWrt snapshot on my router.

I had the same problem every 4h using uqmi protocol and I solved it using watchcat to reboot the router on ping fail. An overkill that works, but leaves with some downtime every 240mins.

I switched to ModemManager protocol just to solve this (I use it extensively and successfully on Debian based distros), but apparently the issue is exactly the same, and is not due to ModemManager as mmcli shows coherent state of the modem/sim and other Linux based systems handles this correctly.

So the solution seems to be searched somewhere between the "logic" interface by openwrt, and the real one from linux.

aleksander0m commented 2 years ago

I believe we have a way forward to fix this issue, introducing support in MM to run dispatcher scripts on certain events (e.g. connection up, connection down...). For context, https://bugs.openwrt.org/index.php?do=details&task_id=3499 and https://lists.freedesktop.org/archives/modemmanager-devel/2022-January/009075.html

arkanoid87 commented 2 years ago

How does NetworkManager does it? Can't ModemManager take care of "keeping up" the connection without external intervention?

Afaik nmcli -t -f GENERAL.STATE con show mygsmconnection detects the change

aleksander0m commented 2 years ago

How does NetworkManager does it?

NM monitors the connection state reported by MM. If MM reports the connection is down, NM will trigger the autoconnection logic (if it was configured to autoconnect).

Can't ModemManager take care of "keeping up" the connection without external intervention?

No, because that process may involve getting new IP settings to use after a reconnection, and MM doesn't touch the network interfaces, it only gathers the info from the modem and exposes it to upper layers (NM or netifd) so that they're the ones configuring the network interface.

Afaik nmcli -t -f GENERAL.STATE con show mygsmconnection detects the change

Yes, as per above, NM monitors the connection state reported by MM.

What we're suggesting now is a way to solve the "missing piece", which is how netifd can detect that the connection managed by MM is actually disconnected. The netifd protocol handler doesn't have any built-in way to e.g. "poll" the status from MM, so the only thing left to do is to have a way to notify netifd that the connection is really down. That way will be dispatcher scripts run by MM itself on certain events, as per https://lists.freedesktop.org/archives/modemmanager-devel/2022-January/009075.html I think it's the easiest way forward, even if it involves adding new features in MM. The dispatcher scripts could also be used for other purposes as well, e.g. if you need to bring up or down firewall rules depending on the connection state, and things like that.

arkanoid87 commented 2 years ago

It would be really nice!

There's an army of routers waiting for this.

I can test this patch if you want, or possibly do some scripting but I'm not really into ModemManager or netifd machinery right now (except for the dbus message passing I've recently debugged)

arkanoid87 commented 2 years ago

btw, I've tested that a simple ifup <netifd interface> on modemmanager protocol controlled interface is sufficient to restore Internet connection when this problem happens, but it still closes all existing connections and ping fails for a couple of seconds.

Possibly something like this can be suggested as cron replacement in the meanwhile a better solution is found.

EDIT: I actually see that's the proposed approach of the existing watchcat-modemmanager integration: https://github.com/openwrt/packages/blob/75933e73f2964e22ced24286041df34005200629/utils/watchcat/files/watchcat.sh#L55

I have a question. Will the dispatcher scripts solution capable or not interrupting existing tcp connections? I guess not as it's a bearer initiated disconnection apparenty, just asking as I know there are proprietary firmwares that succeeds in this with same hardware and SIM card.

grapewheel commented 4 months ago

Hey, guys, I found this can be fixed the bug

mmcli -m 0 -d
mmcli -m 0 --set-power-state-on //  Set full power state in the modem
mmcli -m 0 -e

And then restart your device