IPv6 network stops working after a while

openwrt / odhcp6c

This repository is a mirror of https://git.openwrt.org/?p=project/odhcp6c.git. It is for reference only and is not active for checks-ins or reporting issues; issues should be reported at: https://bugs.openwrt.org. Pull requests will be accepted which will be merged in odhcp6c.git

GNU General Public License v2.0

78 stars 83 forks source link

IPv6 network stops working after a while #61

Open hartmark opened 4 years ago

hartmark commented 4 years ago

I have a Teltonika RUTX11 mobile internet router. It's having issues that makes my router to loose IPv6 connectivity after a while.

root@Teltonika-RUTX11:~# ping6 ftp.sunet.se
PING ftp.sunet.se (2001:6b0:19::165): 56 data bytes
ping6: sendto: Permission denied

If I kill the odhcp6c process it will spawn a new process and IPv6 connectivity is restored for a while but in like 1-3 days It's diving again.

I have logged a task at Teltonika but they haven't been able to pinpoint where the issue is. I have tried starting the process with -v but it seems to not be able to give any more details in the logs.

The process is started with these arguments: odhcp6c -v -s /lib/netifd/dhcpv6.script -P0 -t120 qmimux0

Is there anything I can do to enable more logging to pinpoint the issue?

The /lib/netifd/dhcpv6.script file is quite massive:

#!/bin/sh
[ -z "$2" ] && echo "Error: should be run by odhcpc6c" && exit 1
. /lib/functions.sh
. /lib/netifd/netifd-proto.sh

setup_interface () {
    local device="$1"
    local prefsig=""
    local addrsig=""

    # Apply IPv6 / ND configuration
    HOPLIMIT=$(cat /proc/sys/net/ipv6/conf/$device/hop_limit)
    [ -n "$RA_HOPLIMIT" -a -n "$HOPLIMIT" ] && [ "$RA_HOPLIMIT" -gt "$HOPLIMIT" ] && echo "$RA_HOPLIMIT" > /proc/sys/net/ipv6/conf/$device/hop_limit
    [ -n "$RA_MTU" ] && [ "$RA_MTU" -ge 1280 ] && echo "$RA_MTU" > /proc/sys/net/ipv6/conf/$device/mtu 2>/dev/null
    [ -n "$RA_REACHABLE" ] && [ "$RA_REACHABLE" -gt 0 ] && echo "$RA_REACHABLE" > /proc/sys/net/ipv6/neigh/$device/base_reachable_time_ms
    [ -n "$RA_RETRANSMIT" ] && [ "$RA_RETRANSMIT" -gt 0 ] && echo "$RA_RETRANSMIT" > /proc/sys/net/ipv6/neigh/$device/retrans_time_ms

    proto_init_update "*" 1

    # Merge RA-DNS
    for radns in $RA_DNS; do
        local duplicate=0
        for dns in $RDNSS; do
            [ "$radns" = "$dns" ] && duplicate=1
        done
        [ "$duplicate" = 0 ] && RDNSS="$RDNSS $radns"
    done

    for dns in $RDNSS; do
        proto_add_dns_server "$dns"
    done

    for radomain in $RA_DOMAINS; do
        local duplicate=0
        for domain in $DOMAINS; do
            [ "$radomain" = "$domain" ] && duplicate=1
        done
        [ "$duplicate" = 0 ] && DOMAINS="$DOMAINS $radomain"
    done

    for domain in $DOMAINS; do
        proto_add_dns_search "$domain"
    done

    for prefix in $PREFIXES; do
        proto_add_ipv6_prefix "$prefix"
        prefsig="$prefsig ${prefix%%,*}"
        local entry="${prefix#*/}"
        entry="${entry#*,}"
        entry="${entry#*,}"
        local valid="${entry%%,*}"

        if [ -z "$RA_ADDRESSES" -a -z "$RA_ROUTES" -a \
                -z "$RA_DNS" -a "$FAKE_ROUTES" = 1 ]; then
            RA_ROUTES="::/0,$SERVER,$valid,4096"
        fi
    done

    for prefix in $USERPREFIX; do
        proto_add_ipv6_prefix "$prefix"
    done

    # Merge addresses
    for entry in $RA_ADDRESSES; do
        local duplicate=0
        local addr="${entry%%/*}"
        for dentry in $ADDRESSES; do
            local daddr="${dentry%%/*}"
            [ "$addr" = "$daddr" ] && duplicate=1
        done
        [ "$duplicate" = "0" ] && ADDRESSES="$ADDRESSES $entry"
    done

    for entry in $ADDRESSES; do
        local addr="${entry%%/*}"
        entry="${entry#*/}"
        local mask="${entry%%,*}"
        entry="${entry#*,}"
        local preferred="${entry%%,*}"
        entry="${entry#*,}"
        local valid="${entry%%,*}"

        proto_add_ipv6_address "$addr" "$mask" "$preferred" "$valid" 1
        addrsig="$addrsig $addr/$mask"

        if [ -z "$RA_ADDRESSES" -a -z "$RA_ROUTES" -a \
                -z "$RA_DNS" -a "$FAKE_ROUTES" = 1 ]; then
            RA_ROUTES="::/0,$SERVER,$valid,4096"
        fi

        # RFC 7278
        if [ "$mask" -eq 64 -a -z "$PREFIXES" -a -n "$EXTENDPREFIX" ]; then
            proto_add_ipv6_prefix "$addr/$mask,$preferred,$valid"

            local raroutes=""
            for route in $RA_ROUTES; do
                local prefix="${route%%/*}"
                local entry="${route#*/}"
                local pmask="${entry%%,*}"
                entry="${entry#*,}"
                local gw="${entry%%,*}"

                [ -z "$gw" -a "$mask" = "$pmask" ] && {
                    case "$addr" in
                        "${prefix%*::}"*) continue;;
                    esac
                }
                raroutes="$raroutes $route"
            done
            RA_ROUTES="$raroutes"
        fi
    done

    for entry in $RA_ROUTES; do
        local duplicate=$NOSOURCEFILTER
        local addr="${entry%%/*}"
        entry="${entry#*/}"
        local mask="${entry%%,*}"
        entry="${entry#*,}"
        local gw="${entry%%,*}"
        entry="${entry#*,}"
        local valid="${entry%%,*}"
        entry="${entry#*,}"
        local metric="${entry%%,*}"

        for xentry in $RA_ROUTES; do
            local xprefix="${xentry%%,*}"
            xentry="${xentry#*,}"
            local xgw="${xentry%%,*}"

            [ -n "$gw" -a -z "$xgw" -a "$addr/$mask" = "$xprefix" ] && duplicate=1
        done

        if [ -z "$gw" -o "$duplicate" = 1 ]; then
            proto_add_ipv6_route "$addr" "$mask" "$gw" "$metric" "$valid"
        else
            for prefix in $PREFIXES $ADDRESSES; do
                local paddr="${prefix%%,*}"
                proto_add_ipv6_route "$addr" "$mask" "$gw" "$metric" "$valid" "$paddr"
            done
        fi
    done

    proto_add_data
    [ -n "$CER" ] && json_add_string cer "$CER"
    [ -n "$PASSTHRU" ] && json_add_string passthru "$PASSTHRU"
    [ -n "$ZONE" ] && json_add_string zone "$ZONE"
    proto_close_data

    proto_send_update "$INTERFACE"

    MAPTYPE=""
    MAPRULE=""

    if [ -n "$MAPE" -a -f /lib/netifd/proto/map.sh ]; then
        MAPTYPE="map-e"
        MAPRULE="$MAPE"
    elif [ -n "$MAPT" -a -f /lib/netifd/proto/map.sh -a -f /proc/net/nat46/control ]; then
        MAPTYPE="map-t"
        MAPRULE="$MAPT"
    elif [ -n "$LW4O6" -a -f /lib/netifd/proto/map.sh ]; then
        MAPTYPE="lw4o6"
        MAPRULE="$LW4O6"
    fi

    [ -n "$ZONE" ] || ZONE=$(fw3 -q network $INTERFACE 2>/dev/null)

    if [ "$IFACE_MAP" != 0 -a -n "$MAPTYPE" -a -n "$MAPRULE" ]; then
        [ -z "$IFACE_MAP" -o "$IFACE_MAP" = 1 ] && IFACE_MAP=${INTERFACE}_4
        json_init
        json_add_string name "$IFACE_MAP"
        json_add_string ifname "@$INTERFACE"
        json_add_string proto map
        json_add_string type "$MAPTYPE"
        json_add_string _prefsig "$prefsig"
        [ "$MAPTYPE" = lw4o6 ] && json_add_string _addrsig "$addrsig"
        json_add_string rule "$MAPRULE"
        json_add_string tunlink "$INTERFACE"
        [ -n "$ZONE_MAP" ] || ZONE_MAP=$ZONE
        [ -n "$ZONE_MAP" ] && json_add_string zone "$ZONE_MAP"
        [ -n "$ENCAPLIMIT_MAP" ] && json_add_string encaplimit "$ENCAPLIMIT_MAP"
        [ -n "$IFACE_MAP_DELEGATE" ] && json_add_boolean delegate "$IFACE_MAP_DELEGATE"
        json_close_object
        ubus call network add_dynamic "$(json_dump)"
    elif [ -n "$AFTR" -a "$IFACE_DSLITE" != 0 -a -f /lib/netifd/proto/dslite.sh ]; then
        [ -z "$IFACE_DSLITE" -o "$IFACE_DSLITE" = 1 ] && IFACE_DSLITE=${INTERFACE}_4
        json_init
        json_add_string name "$IFACE_DSLITE"
        json_add_string ifname "@$INTERFACE"
        json_add_string proto "dslite"
        json_add_string peeraddr "$AFTR"
        json_add_string tunlink "$INTERFACE"
        [ -n "$ZONE_DSLITE" ] || ZONE_DSLITE=$ZONE
        [ -n "$ZONE_DSLITE" ] && json_add_string zone "$ZONE_DSLITE"
        [ -n "$ENCAPLIMIT_DSLITE" ] && json_add_string encaplimit "$ENCAPLIMIT_DSLITE"
        [ -n "$IFACE_DSLITE_DELEGATE" ] && json_add_boolean delegate "$IFACE_DSLITE_DELEGATE"
        json_close_object
        ubus call network add_dynamic "$(json_dump)"
    elif [ "$IFACE_464XLAT" != 0 -a -f /lib/netifd/proto/464xlat.sh ]; then
        [ -z "$IFACE_464XLAT" -o "$IFACE_464XLAT" = 1 ] && IFACE_464XLAT=${INTERFACE}_4
        json_init
        json_add_string name "$IFACE_464XLAT"
        json_add_string ifname "@$INTERFACE"
        json_add_string proto "464xlat"
        json_add_string tunlink "$INTERFACE"
        json_add_string _addrsig "$addrsig"
        [ -n "$ZONE_464XLAT" ] || ZONE_464XLAT=$ZONE
        [ -n "$ZONE_464XLAT" ] && json_add_string zone "$ZONE_464XLAT"
        [ -n "$IFACE_464XLAT_DELEGATE" ] && json_add_boolean delegate "$IFACE_464XLAT_DELEGATE"
        json_close_object
        ubus call network add_dynamic "$(json_dump)"
    fi

    # TODO: $SNTP_IP $SIP_IP $SNTP_FQDN $SIP_DOMAIN
}

teardown_interface() {
    proto_init_update "*" 0
    proto_send_update "$INTERFACE"
}

case "$2" in
    bound)
        teardown_interface "$1"
        setup_interface "$1"
    ;;
    informed|updated|rebound)
        setup_interface "$1"
    ;;
    ra-updated)
        [ -n "$ADDRESSES$RA_ADDRESSES$PREFIXES$USERPREFIX" ] && setup_interface "$1"
    ;;
    started|stopped|unbound)
        teardown_interface "$1"
    ;;
esac

# user rules
[ -f /etc/odhcp6c.user ] && . /etc/odhcp6c.user "$@"

exit 0

kenjiuno commented 3 years ago

I also have same problem "IPv6 network stops working after a while".

In my case, I'm using OpenWrt as a company's broadband router. It runs dhcp6d for LAN connection on stateless+stateful mode.

My ISP can provide IPv6 connection via dhcpv6 against WAN connection.

I'm afraid that I suspect that odhcp6c won't process dhcp expire, when it enters to this code block:

https://github.com/openwrt/odhcp6c/blob/53f07e90b7f1da6977143a488dd5cb73a33b233b/src/odhcp6c.c#L524-L569

With quick source code review, I have noticed that kill -SIGUSR1 <PID> or such (SIGUSR2 too?) will be only way to update dhcpv6 lease.

For odhcp6c developers, is there recommended way to mitigate this by user side?

kenjiuno commented 3 years ago

Sorry I may be wrong in previous post.

I have noticed that there are 4 time values in dhcpv6 response from server. obtained with tcpdump -n -vv -i eth0.2 udp portrange 546-547

00:15:21.535949 IP6 (class 0xb8, hlim 255, next-header UDP (17) payload length: 206) fe80::226:bff:fe49:c2c0.547 > fe80::a451:abff:fe7e:5d18.546: [udp sum ok] dhcp6 reply (xid=17778b (client-ID hwaddr type 1 a651ab7e5d18) (server-ID hwaddr type 1 000d5ec4c34c) (SIP-servers-address XXXX:XXXX:XXXX:XXXX::X) (DNS-server XXXX:XXXX:XXXX::X XXXX:XXXX:XXXX:X::X) (DNS-search-list flets-west.jp. iptvf.jp.) (IA_PD IAID:1 T1:7200 T2:10800 (IA_PD-prefix XXXX:XXX:XXXX:XXXX::/56 pltime:12600 vltime:14400)) (SNTP-servers XXXX:XXXX:XXX::X XXXX:XXXX:XXX::X))

item	period
T1 RENEW	7,200 secs → 2 hrs
T2 REBIND	10,800 secs → 3 hrs
Preferred lifetime	12,600 secs → 3.5 hrs
Valid lifetime	14,400 secs → 4 hrs

It seems that odhcp6c uses Valid lifetime instead of T1/T2

hartmark commented 3 years ago

I also posted my issue directly to teltonica and according to them it appears that my ISP didn't send out periodic RAs so my default route which had expire time of 65535 seconds expired.

They also refered to this old workaround https://github.com/openwrt/openwrt/commit/8691d75917d91a39f2011d4ddd0713b8562e5e3a

So it seems the workaound was to use the fakeroute option somehow

eriktews commented 3 years ago

Hi, I'm affected by an issue with at least the same effect (after a while like a day or so, my IPv6 access breaks). In my logs, I can see something like Server returned IA_PD status 'No Binding (Who are you? Do I know you?)' Do you think this is the same bug as the one described here?

hartmark commented 3 years ago

Hi, I'm not so knowledgeable in this area but it seems to be the same issue.

eriktews commented 2 years ago

Hi, I'm affected by an issue with at least the same effect (after a while like a day or so, my IPv6 access breaks). In my logs, I can see something like Server returned IA_PD status 'No Binding (Who are you? Do I know you?)' Do you think this is the same bug as the one described here?

I investigated that a bit further and found out that now my IPv6 access works for 5-10 seconds, then it breaks for a few seconds and then the cycle repeats. When that happens, the error message above appears in my log.

kenjiuno commented 2 years ago

About my case, I decided to use workaround until good resolution will be available.

It is to use crontab like:

45 * * * * kill -SIGUSR2 `pidof odhcp6c`

At minute 45 every hour, odhcp6c process will receive SIGUSR2 signal, and then odhcp6c will invoke IPv6 release/renew transactions.

This will be useful only if IPv6 connection will surely work until it's lease expiration.

My problem is that IPv6 route's expiration isn't renewed after odhcpv6 renew transaction.

root@OpenWrt:~# ip -6 r
default from 2409:250:XXXX:YYYY::/56 via fe80::226:bff:fe49:c2c0 dev wan  metric 512
2409:250:XXXX:YYYY::/64 dev br-lan  metric 1024
2409:250:XXXX:YYYY::/60 dev br-lan  metric 256  expires 2035sec
unreachable 2409:250:XXXX:YYYY::/56 dev lo  metric 2147483647

The route 2409:250:XXXX:YYYY::/60 dev br-lan metric 256 surely expires in 2035sec, without renewed.

PF4Public commented 2 years ago

With quick source code review, I have noticed that kill -SIGUSR1 <PID> or such (SIGUSR2 too?) will be only way to update dhcpv6 lease.

For me only SIGUSR2 worked that way. SIGUSR1 did nothing for some reason.

But still, why doesn't it renew on T1?

emss-github commented 1 year ago

Hi, I'm affected by an issue with at least the same effect (after a while like a day or so, my IPv6 access breaks). In my logs, I can see something like Server returned IA_PD status 'No Binding (Who are you? Do I know you?)' Do you think this is the same bug as the one described here?

I investigated that a bit further and found out that now my IPv6 access works for 5-10 seconds, then it breaks for a few seconds and then the cycle repeats. When that happens, the error message above appears in my log.

Same here, T1 and T2 are 150 & 240, preferred lifetime for prefix is 300. What further informations are required to solve this issue, please ?

Further elements in https://github.com/openwrt/openwrt/issues/13086#issuecomment-1638673926

missing233 commented 1 year ago

About my case, I decided to use workaround until good resolution will be available.

It is to use crontab like:
45 * * * * kill -SIGUSR2 `pidof odhcp6c`
At minute 45 every hour, odhcp6c process will receive signal, and then odhcp6c will invoke IPv6 release/renew transactions.SIGUSR2

This will be useful only if IPv6 connection will surely work until it's lease expiration.

My problem is that IPv6 route's expiration isn't renewed after odhcpv6 renew transaction.
root@OpenWrt:~# ip -6 r
default from 2409:250:XXXX:YYYY::/56 via fe80::226:bff:fe49:c2c0 dev wan  metric 512
2409:250:XXXX:YYYY::/64 dev br-lan  metric 1024
2409:250:XXXX:YYYY::/60 dev br-lan  metric 256  expires 2035sec
unreachable 2409:250:XXXX:YYYY::/56 dev lo  metric 2147483647
The route surely expires in 2035sec, without renewed.2409:250:XXXX:YYYY::/60 dev br-lan metric 256

I've come across the same issue on NTT FLET'S CROSS. The lifetime is:

T1: 7200
T2: 10800
Preferred Lifetime: 12600
Valid Lifetime: 14400

Every 8 hours, I get the following log: 'daemon.notice netifd: Interface 'wan6' has lost the connection'. Are you running into the same thing?

emss-github commented 1 year ago

About my case, I decided to use workaround until good resolution will be available. It is to use crontab like:
45 * * * * kill -SIGUSR2 `pidof odhcp6c`
At minute 45 every hour, odhcp6c process will receive signal, and then odhcp6c will invoke IPv6 release/renew transactions.SIGUSR2 This will be useful only if IPv6 connection will surely work until it's lease expiration. My problem is that IPv6 route's expiration isn't renewed after odhcpv6 renew transaction.
root@OpenWrt:~# ip -6 r
default from 2409:250:XXXX:YYYY::/56 via fe80::226:bff:fe49:c2c0 dev wan  metric 512
2409:250:XXXX:YYYY::/64 dev br-lan  metric 1024
2409:250:XXXX:YYYY::/60 dev br-lan  metric 256  expires 2035sec
unreachable 2409:250:XXXX:YYYY::/56 dev lo  metric 2147483647
The route surely expires in 2035sec, without renewed.2409:250:XXXX:YYYY::/60 dev br-lan metric 256
I've come across the same issue on NTT FLET'S CROSS. The lifetime is:
T1: 7200
T2: 10800
Preferred Lifetime: 12600
Valid Lifetime: 14400
Every 8 hours, I get the following log: 'daemon.notice netifd: Interface 'wan6' has lost the connection'. Are you running into the same thing?

Can't tell for sure, using 23.05.0-rc3 for 6 hours now with apparently no issues.

kenjiuno commented 1 year ago

Every 8 hours, I get the following log: 'daemon.notice netifd: Interface 'wan6' has lost the connection'. Are you running into the same thing?

What I understood in my case is that: there is another DHCPv6 enabled client (or proxy) on the same LAN.

This is not a DHCPv6 client side problem of OpenWrt device.

Suspect whether the another router like device like Business Phone unit or such may dispatch DHCPv6 Solicit packet.

Check if you are interested in:

サクサのひかり電話オフィス収容ユニットとIPv6の共存模索 | mixiユーザー(id:2416887)の日記

サクサの収容ユニットを調べていたところ、 IPv4オンリーなくせに、何故かIPv6を取得しに行く。そして、取得したIPv6はどこにも使われることもなく、全てIPv4で通信を行っている・・・・・・らしい。このことをサクサの技術者に言ったところ、「ああ、それがIPv6共存できない理由なのかもしれませんね」とか言ってきた。

As network fundamental idea, router treats Layer 2 data.

↓ OpenWrt will act as DHCPv6 client

↓ Then, SAXA unit will act as DHCPv6 client after X hours. And then absorb all of incoming packets from WAN.

As a workaround of this issue, there is an idea to implement ping client and watchdog timer in order to restart DHCPv6 client.

https://github.com/HiraokaHyperTools/openwrt-watchngn

missing233 commented 1 year ago

Every 8 hours, I get the following log: 'daemon.notice netifd: Interface 'wan6' has lost the connection'. Are you running into the same thing?

What I understood in my case is that: there is another DHCPv6 enabled client (or proxy) on the same LAN.

This is not a DHCPv6 client side problem of OpenWrt device.

Suspect whether the another router like device like Business Phone unit or such may dispatch DHCPv6 Solicit packet.

Check if you are interested in:

サクサのひかり電話オフィス収容ユニットとIPv6の共存模索 | mixiユーザー(id:2416887)の日記

サクサの収容ユニットを調べていたところ、 IPv4オンリーなくせに、何故かIPv6を取得しに行く。そして、取得したIPv6はどこにも使われることもなく、全てIPv4で通信を行っている・・・・・・らしい。このことをサクサの技術者に言ったところ、「ああ、それがIPv6共存できない理由なのかもしれませんね」とか言ってきた。

As network fundamental idea, router treats Layer 2 data.

↓ OpenWrt will act as DHCPv6 client

↓ Then, SAXA unit will act as DHCPv6 client after X hours. And then absorb all of incoming packets from WAN.

As a workaround of this issue, there is an idea to implement ping client and watchdog timer in order to restart DHCPv6 client.

https://github.com/HiraokaHyperTools/openwrt-watchngn

No, that's not the case in my home network. There are no other dhcpv6 client besides the OpenWRT router, and I also do not have a contract like Hikari Denwa. Even without a Hikari Denwa contract, NTT still allocates me a /56 IPv6-PD.

My home network like this: NTT 10G-ONU->OpenWRT router->Switch Hub->AP/PC/NAS...

I’ve noticed that this issue only occurs with OpenWRT devices that connect to FLET'S CROSS, while FLET'S NEXT doesn’t seem to have a similar problem, perhaps due to its Valid Lifetime lasting as long as a month. Additionally, NEC routers do not exhibit this issue, whether it's the regular one sold in stores or NTT's XG-100NE(HGW).

JesusArmy commented 7 months ago

Just subscribed to NTT 10G Cross with Plala and I am getting a similar behavior with open-wrt...

Anyone got a good fix or workaround since last time? It takes less than a minute to recover by itself but everything will be interrupted then, which is quite inconvenient, especially when you are in the middle of a meeting... :D

I can see these logs eveytime I get an outage: Sun Mar 17 16:33:29 2024 daemon.warn odhcp6c[19403]: Server returned IA_PD status 'No Binding ' Sun Mar 17 16:55:05 2024 daemon.warn odhcpd[1460]: No default route present, overriding ra_lifetime!

That seems to be same problem isn't it?

PF4Public commented 7 months ago

For me sending SIGUSR2 works as mentioned above. I've got an impression that this behaviour was supposedly fixed in a newer openwrt version, but I cannot verify that as my hardware is too old.

JesusArmy commented 7 months ago

Thanks for your reply PF4Public. I will setup that in crontab then and see if better. For info I am running a pretty recent version of OpenWRT (23.05.2) since I am just using a normal x86 PC to run it. So it seems that it may not be totally fixed for now...

JesusArmy commented 7 months ago

Running "kill -SIGUSR2 pidof odhcp6c" on my open-wrt version did bring internet down and it did not recovered by itself this time actually... I may try the alternative option with SIGUSR1 then... :)

missing233 commented 7 months ago

Running "kill -SIGUSR2 pidof odhcp6c" on my open-wrt version did bring internet down and it did not recovered by itself this time actually... I may try the alternative option with SIGUSR1 then... :)

Sending SIGUSR2 signal to odhcp6c means to send RELEASE and restart the state machine by sending SOLICIT messages, while SIGUSR1 sends a renew message. Try this script: https://gist.github.com/missing233/3dafb6ee549ed2271c20bd700b88a9cd

JesusArmy commented 7 months ago

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

cre8ivejp commented 7 months ago

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.

0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)

You will see this log every hour.

Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

JesusArmy commented 7 months ago

Thank you both, that seems to be working great so far. I have not noticed any disconnection for the last 8h :)

missing233 commented 7 months ago

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.
0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)
You will see this log every hour.
Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

I found that OpenWrt package repo has isc-dhcp-client. Perhaps we can use it to replace odhcp6c. However its size is close to 1MB, which is much larger than odhcp6c.

cre8ivejp commented 7 months ago

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.
0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)
You will see this log every hour.
Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)
I found that OpenWrt package repo has isc-dhcp-client. Perhaps we can use it to replace odhcp6c. However its size is close to 1MB, which is much larger than odhcp6c.

Thanks for sharing this. I'm using a mini-pc with a lot of storage and memory, so the size is not a problem. I'll try that and see how it goes.

missing233 commented 7 months ago

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.
0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)
You will see this log every hour.
Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)
I found that OpenWrt package repo has isc-dhcp-client. Perhaps we can use it to replace odhcp6c. However its size is close to 1MB, which is much larger than odhcp6c.
Thanks for sharing this. I'm using a mini-pc with a lot of storage and memory, so the size is not a problem. I'll try that and see how it goes.

Looking forward to hearing some good news :)

JesusArmy commented 7 months ago

Thanks a lot for script. Just to be sure I am doing things right: I would just drop the script file in "/etc/hotplug.d/iface/", maybe enable the execute bit and that's it? It will be executed (at next boot? on interface status change?) and will add the "kill -SIGUSR1 $(pgrep odhcp6c)' "to be run every hour by crontab? Is that it ?

@JesusArmy, or you can just add the following line in the Scheduled Tasks from the System menu. I've been running this for 2 months now with no issues. It will renew every hour.
0 * * * * kill -SIGUSR1 $(pgrep odhcp6c)
You will see this log every hour.
Mon Mar 18 21:00:00 2024 cron.err crond[919]: USER root pid 220201 cmd kill -SIGUSR1 $(pgrep odhcp6c)

I have been running that since last week, with quiet some success but still getting issues occasionally. It used to be many outage a day, and now I seems to get one or two maximum.

If I want to investigate what's happening, should I change log level to something more verbose or do I need to run some 24/7 tcpdump to capture what's happening at that time?

Today the only thing I see in logs when that happen a bunch of "no route" logs like this:

Mon Mar 25 13:28:11 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:20 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:21 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:22 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:23 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa Mon Mar 25 13:28:23 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa SOPAAD-PW04SSWC Mon Mar 25 13:28:41 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:42 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:43 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:28:43 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa Mon Mar 25 13:28:43 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa SOPAAD-PW04SSWC Mon Mar 25 13:28:55 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.90 bb:bb:bb:bb:bb:bb Mon Mar 25 13:28:55 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.90 cc:cc:cc:cc:cc:cc Pixel-7a Mon Mar 25 13:28:55 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:30:45 2024 daemon.info dnsmasq-dhcp[1]: DHCPREQUEST(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa Mon Mar 25 13:30:45 2024 daemon.info dnsmasq-dhcp[1]: DHCPACK(eth1) x.x.x.136 aa:aa:aa:aa:aa:aa SOPAAD-PW04SSWC Mon Mar 25 13:30:45 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:30:46 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime! Mon Mar 25 13:30:47 2024 daemon.warn odhcpd[1471]: No default route present, overriding ra_lifetime!

JesusArmy commented 7 months ago

Just wondering, are you also getting the quite low MTU of 1280 on your map-e interface ? I have also manually increased the TX length of the eth and map-e interface from 1000 to 10000 since I have noticed some drops. Not sure if that was the reason or not....

JesusArmy commented 7 months ago

FYI, it's been 48h that I have increased tx lenght from 1,000 to 10,000 on both eth physical interfaces (from luci) and the map-e interface (from terminal) and so far so good. Zero disconnection in 2 days. :)

cre8ivejp commented 7 months ago

FYI, it's been 48h that I have increased tx lenght from 1,000 to 10,000 on both eth physical interfaces (from luci) and the map-e interface (from terminal) and so far so good. Zero disconnection in 2 days. :)

Glad to hear it. I didn't need to change the TX queue length here, though. Did you check if the cronjob runs correctly every hour without errors?

JesusArmy commented 7 months ago

Yep it was running every hours, but I was still getting some additional disconnections. And I even got up to 3 within the same hour at some point before changing TX queue length.

There is still the possibility that just restarting the network stack would have been enough to fix that issue. And that the TX queue modification is actually absolutely not related to getting a stable link... :innocent:

missing233 commented 7 months ago

@cre8ivejp @JesusArmy

Not sure but I've never come across "No default route present, overriding ra_lifetime," and my network connection is solid as a rock. I'm on the V6Plus Fixed IP, maybe you could take a look at my config file for reference:

/etc/config/network:

config interface 'lan'
        option device 'br-lan'
        option proto 'static'
        option ipaddr '192.168.1.1'
        option netmask '255.255.255.0'
        option ip6assign '64'
        option ip6hint '01'
        option ip6ifaceid 'eui64'

config interface 'wan6'
        option device 'eth1'
        option proto 'dhcpv6'
        option reqaddress 'try'
        option reqprefix 'auto'
        option noclientfqdn '1'

config interface 'wan'
        option proto 'ipip6'
        option peeraddr '<secret>'
        option ip4ifaddr '<secret>'
        option ip6addr '<secret>'
        option tunlink 'wan6'
        option encaplimit 'ignore'
        option mtu '1460'
        option ip6assign '64'
        option ip6ifaceid '::<secret>'
        option ip6weight '1'

/etc/config/dhcp:

config dnsmasq
        option domainneeded '1'
        option localise_queries '1'
        option rebind_protection '1'
        option rebind_localhost '1'
        option local '/lan/'
        option domain 'lan'
        option expandhosts '1'
        option cachesize '0'
        option authoritative '1'
        option readethers '1'
        option leasefile '/tmp/dhcp.leases'
        option localservice '1'
        option ednspacket_max '1232'
        option noresolv '1'
        option localuse '1'
        list server '127.0.0.1#7874'
        option sequential_ip '1'

config dhcp 'lan'
        option interface 'lan'
        option start '2'
        option limit '255'
        option leasetime '12h'
        option dhcpv4 'server'
        option dhcpv6 'server'
        option ra 'server'
        list ra_flags 'managed-config'
        list ra_flags 'other-config'

config odhcpd 'odhcpd'
        option maindhcp '0'
        option leasefile '/tmp/hosts/odhcpd'
        option leasetrigger '/usr/sbin/odhcpd-update'
        option loglevel '4'

JesusArmy commented 6 months ago

Hi missing 223. Thanks a lot for sharing your config. Connection has been very stable without any disconnect for the last 2 weeks. I believe that I will keep it as is for now then. :)

achims311 commented 1 month ago

I have the same issue running latest version(OpenWrt 23.05.4 r24012-d8dd03c46f) , but my refresh values are much lower (T1:600 T2:960 (IA_PD-prefix 2a02:678:640:e800::/56 pltime:1200 vltime:3600)).

But what I noticed is I've got 2 odhcp6c processes running:

root@OpenWrt:~# ps | grep odhcp6c 5682 root 848 S odhcp6c -s /lib/netifd/dhcpv6.script -Ntry -P56 -t120 pppoe-wan 6256 root 1144 R grep odhcp6c 29379 root 848 S odhcp6c -s /lib/netifd/dhcpv6.script -P0 -t120 pppoe-wan root@OpenWrt:~# I can see as well I've got 3 wan* interfaces. 2 are the normal wan and wan6. On top I got a wan_6: "Protocol: Virtual dynamic interface (DHCPv6 client)"

Now I configure the wan6 to request a /56 network (the default I get from my ISP). As you can see above only process 5682 is using this value (-P56), while the other (29379) is using the normal default (-P0).

I can see as well 2 diferent dhcp6 solicit & advertise pairs(differences marked bold): 20:54:52.411887 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 139) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=c0e771 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 02f4b76f716c) (reconfigure-accept) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/56 pltime:0 vltime:0))) 20:54:52.412789 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 121) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 advertise (xid=c0e771 (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:600 T2:960 (IA_PD-prefix 2a02:678:640:e800::/56 pltime:1200 vltime:3600)) (DNS-server 2a02:678:0:195:218:2:32:38 2a02:678:0:195:218:24:0:2)) 20:54:54.371258 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 123) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 solicit (xid=ad4f4b (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96 opt_82) (client-ID hwaddr type 1 02f4b76f716c) (reconfigure-accept) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix ::/56 pltime:0 vltime:0))) 20:54:54.372147 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 121) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 advertise (xid=ad4f4b (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:600 T2:960 (IA_PD-prefix 2a02:678:640:e800::/56 pltime:1200 vltime:3600)) (DNS-server 2a02:678:0:195:218:2:32:38 2a02:678:0:195:218:24:0:2))

As well I only get one dhcp6 request & reply: 20:54:55.911228 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 135) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 request (xid=fafe3f (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 02f4b76f716c) (server-ID hwaddr type 1 40017a013d80) (reconfigure-accept) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:678:640:xxxx::/56 pltime:1200 vltime:3600))) 20:54:55.913809 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 121) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 reply (xid=fafe3f (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:600 T2:960 (IA_PD-prefix 2a02:678:640:xxxx::/56 pltime:1200 vltime:3600)) (DNS-server 2a02:678:0:195:218:2:32:38 2a02:678:0:195:218:24:0:2)) and after some time (looks like t2???): 21:04:55.990104 IP6 (flowlabel 0x580de, hlim 1, next-header UDP (17) payload length: 131) fe80::1c66:a4e5:4466:d020.546 > ff02::1:2.547: [udp sum ok] dhcp6 renew (xid=edd84a (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list SNTP-servers NTP-server AFTR-Name opt_67 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 02f4b76f716c) (server-ID hwaddr type 1 40017a013d80) (Client-FQDN) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:678:640:xxxx::/56 pltime:0 vltime:0))) 21:04:55.991030 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 72) fe80::4201:7aff:fe01:3d80.547 > fe80::1c66:a4e5:4466:d020.546: [udp sum ok] dhcp6 reply (xid=edd84a (server-ID hwaddr type 1 40017a013d80) (client-ID hwaddr type 1 02f4b76f716c) (IA_PD IAID:1 T1:60 T2:120 (status-code NoBinding))) and the respective log: Mon Sep 9 21:04:55 2024 daemon.warn odhcp6c[5682]: Server returned IA_PD status 'No Binding (NO-BINDING)'

After this I get another dhcp request & reply as above.

For me it looks like after the second request & reply I loose my IPv6 connection. But this is something I try to check (take more time to validate)

missing233 commented 3 weeks ago

@cre8ivejp @JesusArmy have you checked wan6 status after removing the forced renew crontab command? I recently removed the command, and wan6 hasn't lost connection since (it's been connected for over a day now). maybe NTT has fixed their dhcpv6 server?

BombardierBeetle commented 2 weeks ago

Im using odhcp6c in NTT env(フレッツ光クロス), but due to the above problem, I was running renew(SIGUSR1) via crontab It has been about 24 hours since I deleted crontab, but the problem does not seem to occur

As missing233 said, it's possible that NTT has fixed something.