sebbacon / kindle-dashboard

Dashboard for Kindle Paperwhite 3 (KUAL extension)
0 stars 1 forks source link

Weird failure mode #1

Open sebbacon opened 1 year ago

sebbacon commented 1 year ago

Sometimes, there's a network error.

The sleep screen freezes at a certain time.

If I press the wake button, I see the last active screen (KUAL buttons), but it doesn't respond to gestures.

I press the wake button again and the screensaver goes on.

I press it again, and I'm in normal mode.

Example logs:

ue Sep 27 22:56:29 BST 2022: Awake; pausing for 30s
Tue Sep 27 22:56:59 BST 2022: Pause finished
Tue Sep 27 22:56:59 BST 2022: Refreshing calendar
Tue Sep 27 22:56:59 BST 2022: before run.py
2022-09-27 22:57:03.998008: Getting data
2022-09-27 22:57:05.126554: Got data
2022-09-27 22:57:05.134733: Made SVG
Tue Sep 27 22:57:05 BST 2022: after run.py
Tue Sep 27 22:57:12 BST 2022: now deep sleeping for one hour
Tue Sep 27 23:57:12 BST 2022: ; standby mem
Tue Sep 27 23:57:12 BST 2022: Awake; pausing for 30s
Tue Sep 27 23:57:42 BST 2022: Pause finished
Tue Sep 27 23:57:42 BST 2022: Refreshing calendar
Tue Sep 27 23:57:42 BST 2022: before run.py
Traceback (most recent call last):
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/connection.py", line 175, in _new_conn
    conn = connection.create_connection(
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/util/connection.py", line 72, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/mnt/us/python3/lib/python3.9/socket.py", line 954, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 700, in urlopen
    httplib_response = self._make_request(
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 383, in _make_request
    self._validate_conn(conn)
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1012, in _validate_conn
    conn.connect()
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/connection.py", line 359, in connect
    conn = self._new_conn()
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/connection.py", line 187, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x419e5178>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/us/python3/lib/python3.9/site-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 757, in urlopen
    retries = retries.increment(
  File "/mnt/us/python3/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x419e5178>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/us/python3/lib/python3.9/site-packages/google/auth/transport/requests.py", line 193, in __call__
    response = self.session.request(
  File "/mnt/us/python3/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/mnt/us/python3/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/mnt/us/python3/lib/python3.9/site-packages/requests/adapters.py", line 517, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x419e5178>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/base-us/extensions/dashboard/./bin/run.py", line 69, in render_calendar
    svg_data = get_data()
  File "/mnt/base-us/extensions/dashboard/./bin/run.py", line 37, in get_data
    events = get_events()
  File "/mnt/base-us/extensions/dashboard/bin/cal.py", line 48, in get_events
    service = get_service()
  File "/mnt/base-us/extensions/dashboard/bin/cal.py", line 115, in get_service
    creds.refresh(Request())
  File "/mnt/us/python3/lib/python3.9/site-packages/google/oauth2/credentials.py", line 302, in refresh
    ) = reauth.refresh_grant(
  File "/mnt/us/python3/lib/python3.9/site-packages/google/oauth2/reauth.py", line 322, in refresh_grant
    response_status_ok, response_data = _client._token_endpoint_request_no_throw(
  File "/mnt/us/python3/lib/python3.9/site-packages/google/oauth2/_client.py", line 124, in _token_endpoint_request_no_throw
    response = request(
  File "/mnt/us/python3/lib/python3.9/site-packages/google/auth/transport/requests.py", line 199, in __call__
    six.raise_from(new_exc, caught_exc)
  File "<string>", line 3, in raise_from
google.auth.exceptions.TransportError: HTTPSConnectionPool(host='oauth2.googleapis.com', port=443): Max retries exceeded with url: /token (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x419e5178>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution'))
2022-09-27 23:57:47.477680: Getting data
2022-09-27 23:57:47.532928: Caught exception
2022-09-27 23:57:47.571067: Continuing...
Tue Sep 27 23:57:48 BST 2022: after run.py
Wed Sep 28 18:50:32 BST 2022: now deep sleeping for one hour
Wed Sep 28 18:53:05 BST 2022: 1664391031; standby mem
Wed Sep 28 18:53:05 BST 2022: Awake; pausing for 30s
sebbacon commented 1 year ago

Shorter version

Tue Sep 27 23:57:12 BST 2022: Awake; pausing for 30s
Tue Sep 27 23:57:42 BST 2022: Pause finished
Tue Sep 27 23:57:42 BST 2022: Refreshing calendar                 <- final bit in start.sh
Tue Sep 27 23:57:42 BST 2022: before run.py                       <- in common.sh
<error happens>
2022-09-27 23:57:47.477680: Getting data                          <- first line of try block
2022-09-27 23:57:47.532928: Caught exception                      <- exception handler, it then outputs the the traceback (even thought it's appearing earlier)
2022-09-27 23:57:47.571067: Continuing...                         <- last line of try block
Tue Sep 27 23:57:48 BST 2022: after run.py                        <- first line after python is executed
<press wake and it appears to unset the screen, but stay frozen>
Wed Sep 28 18:50:32 BST 2022: now deep sleeping for one hour      <- this is after 5s sleep that is immediately after common.sh
<press wake again, it actually wakes>
Wed Sep 28 18:53:05 BST 2022: 1664391031; standby mem             <- first thing it does on wake 
sebbacon commented 1 year ago

I added more logging.

Starting from the first log line on 25 Oct:

2022-10-25 00:45:45.184972: Getting data
2022-10-25 00:45:45.240291: Caught exception
2022-10-25 00:45:45.278740: Continuing...
Tue Oct 25 00:45:45 BST 2022: after run.py
Tue Oct 25 00:45:46 BST 2022: wrote calendar to screen
Tue Oct 25 00:45:46 BST 2022: Finished running common.sh
Tue Oct 25 00:45:51 BST 2022: now deep sleeping for one hour

Tue Oct 25 01:45:52 BST 2022: ; standby mem
Tue Oct 25 01:45:52 BST 2022: Awake; pausing for 30s
Tue Oct 25 01:46:22 BST 2022: Pause finished
Tue Oct 25 01:46:22 BST 2022: Refreshing calendar
Tue Oct 25 01:46:22 BST 2022: before run.py
  --
  socket.gaierror: [Errno -3] Temporary failure in name resolution
  etc 
  --
2022-10-25 01:46:26.928099: Getting data
2022-10-25 01:46:26.983242: Caught exception
2022-10-25 01:46:27.021445: Continuing...
Tue Oct 25 01:46:27 BST 2022: after run.py
Tue Oct 25 01:46:28 BST 2022: wrote calendar to screen
Tue Oct 25 01:46:28 BST 2022: Finished running common.sh
Tue Oct 25 01:46:33 BST 2022: now deep sleeping for one hour

Tue Oct 25 02:46:33 BST 2022: ; standby mem
Tue Oct 25 02:46:33 BST 2022: Awake; pausing for 30s
Tue Oct 25 02:47:03 BST 2022: Pause finished
Tue Oct 25 02:47:03 BST 2022: Refreshing calendar
Tue Oct 25 02:47:03 BST 2022: before run.py
  socket.gaierror: [Errno -3] Temporary failure in name resolution
  etc
2022-10-25 02:47:07.647423: Getting data
2022-10-25 02:47:07.706288: Caught exception
2022-10-25 02:47:07.744864: Continuing...
Tue Oct 25 02:47:08 BST 2022: after run.py
Tue Oct 25 02:47:09 BST 2022: wrote calendar to screen
Tue Oct 25 02:47:09 BST 2022: Finished running common.sh
Tue Oct 25 02:47:14 BST 2022: now deep sleeping for one hour

Tue Oct 25 03:47:14 BST 2022: ; standby mem
Tue Oct 25 03:47:14 BST 2022: Awake; pausing for 30s

and that's the last line on 25th - followed by nothing for 2 whole days, until:

Thu Oct 27 00:45:23 BST 2022: Pause finished
Thu Oct 27 00:45:23 BST 2022: Refreshing calendar
Thu Oct 27 00:45:23 BST 2022: before run.py
  socket.gaierror: [Errno -3] Temporary failure in name resolution
  etc
sebbacon commented 1 year ago

So it seems like the kindle is getting into a state where it's unable to resolve DNS; and this state is reset by me pressing the restart button.

Some online have found it's a firewall issue Others suggest it's to do with HTTP connections not being closed (discussions)

Here's a good description of the underlying error, which is as you'd think; fundamentally DNS seems to have disappeared.

sebbacon commented 1 year ago

Interestingly, resolv.conf appears to be rewritten on a soft start:

[root@kindle root]# cat /etc/resolv.conf 
domain home
nameserver 208.67.222.222
nameserver 208.67.220.220
[root@kindle root]# ls -l /etc/resolv.conf 
lrwxrwxrwx    2 root     root            20 Mar  2  2021 /etc/resolv.conf -> /var/run/resolv.conf
[root@kindle root]# ls -l /var/run/resolv.conf 
-rw-rw-r--    1 root     javauser        64 Oct 31 08:42 /var/run/resolv.conf
sebbacon commented 1 year ago

So there's an upstart script chroot.conf which is marked as start on framework_ready; it mounts the root filesystem as rw, makes a bunch of folders in /chroot/, and then mounts them at the usual locations:

chroot_mount()
{ 
    # .... stuff ...                   

    mount -o bind /etc/resolv.d /chroot/etc/resolv.d || true                    

}

However, note that this makes a folder and doesn't touch resolv.conf itself. At the end of the chroot.conf upstart config there is a line ln /etc/resolv.conf /chroot/etc/resolv.conf || true but the currently-activated config is not

Other candidates to examine:

ppp/ip-up
ppp/ip-down
upstart/chroot.conf
upstart/firsttime

It's quite complicated, but I think it's probably all happening in /etc/ppp/ip-up; full source here for future ref:

#!/bin/sh

_FUNCTIONS=/etc/rc.d/functions
[ -f ${_FUNCTIONS} ] && . ${_FUNCTIONS}

. /var/local/wan/info

PPP_IP_START=/var/run/ppp/ip_start

PPP_RESOLV_CONF=$(readlink /etc/resolv.conf)
PPP_RESOLV_SAVE=/var/run/resolv.conf.save
PPP_RESOLV_NEW="/etc/resolv.d/resolv.conf.${WAN_PEER}"
RESOLV_OPTIONS="options timeout:5 attempts:2 rotate\n"

LIPC_HASH_PROP=/usr/bin/lipc-hash-prop
LIPC_GET_PROP=/usr/bin/lipc-get-prop

INTERFACE="$1"
TTY_DEVICE="$2"
SPEED="$3"
LOCAL_IP="$4"
GATEWAY_IP="$5"

DNS1_NAMESERVER=""
DNS2_NAMESERVER=""

if [ -z "${USEPEERDNS}" -a ! -r ${PPP_RESOLV_NEW} ]; then
    msg "unknown WAN peer (${WAN_PEER})" E
fi

if [ 0 -ne "$(${LIPC_GET_PROP} -i com.lab126.wan shouldRoute)" ]; then
        msg "PPP adding route to default gw and resolv.conf file" I
    cp ${PPP_RESOLV_CONF} ${PPP_RESOLV_SAVE}
    cp ${PPP_RESOLV_NEW} ${PPP_RESOLV_CONF}

    route add default gw "${GATEWAY_IP}" "${INTERFACE}"
fi

#
# ppp will call ip-up with USEPEERDNS defined if the usepeerdns option is set.
# In whispernet2.0 sims where we use VPN, our nameservers are hardwired and
# based on /etc/resolv.d files.
# In whispernet3.0 sims we use the dns server as provided by our peer via ipcp
# so the usepeerdns option is set.  See /platform/etc/ppp/peers/peer-wn3-3 as
# an example.
#
if [ -n "${USEPEERDNS}" ]; then
    msg "Using Peer DNS (xxx.xxx.$(echo ${DNS1} | cut -d '.' -f 3,4)) and (xxx.xxx.$(echo ${DNS2} | cut -d '.' -f 3,4))" I
else
    msg "PPP Using DNS from (${PPP_RESOLV_NEW})" I
    DNSSERVER=1
    for DNS in $(cat ${PPP_RESOLV_NEW} \
                | sed -e 's/^nameserver[ \t]\+\([0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+\).*$/\1/; t; d'); do
        if [ "${DNSSERVER}" == "1" ]; then
            DNS1=${DNS}
        elif [ "${DNSSERVER}" == "2" ]; then
            DNS2=${DNS}
        fi
        let "DNSSERVER += 1"
    done
    msg "Using ${PPP_RESOLV_NEW} nameservers xxx.xxx.$(echo ${DNS1} | cut -d '.' -f 3,4) and xxx.xxx.$(echo ${DNS2} | cut -d '.' -f 3,4)" I
fi

if [ -n "{DNS1}" -a "${DNS1}" != "0.0.0.0" ]; then
    route add -host ${DNS1} gw ${GATEWAY_IP} ${INTERFACE}
    DNS1_NAMESERVER="nameserver ${DNS1}\n"
fi
if [ -n "{DNS2}" -a "${DNS2}" != "0.0.0.0" ]; then
    route add -host ${DNS2} gw ${GATEWAY_IP} ${INTERFACE}
    DNS2_NAMESERVER="nameserver ${DNS2}\n"
fi

# /usr/bin/lipc-send-event com.lab126.hal wanPPPUp

echo -e "{ cmIpAddr = \"${LOCAL_IP}\",
        cmDefaultGW = \"${GATEWAY_IP}\",
        cmNetMask = \"255.255.255.255\",
        cmResolvConf = \"${DNS1_NAMESERVER}${DNS2_NAMESERVER}${RESOLV_OPTIONS}\" }" \
            | ${LIPC_HASH_PROP} com.lab126.wan cmIntfInfo

echo PPP interface up ${INTERFACE} ${TTY_DEVICE} ${SPEED} ${LOCAL_IP} ${GATEWAY_IP} > ${PPP_IP_START}

msg "PPP interface up ${INTERFACE} ${TTY_DEVICE} ${SPEED} xxx.xxx.$(echo ${LOCAL_IP} | cut -d '.' -f 3,4) xxx.xxx.$(echo ${GATEWAY_IP} | cut -d
 '.' -f 3,4) nameserver xxx.xxx.$(echo ${DNS1} | cut -d '.' -f 3,4) nameserver xxx.xxx.$(echo ${DNS2} | cut -d '.' -f 3,4)" I

For now, I've just added a line that logs /etc/resolv.conf in each loop of start.sh, to see if that changes when things break; if not, I could also add some DNS debug commands in there too.

sebbacon commented 1 year ago

I'm trying out a reboot after a wakeup, to see if that fixes the network.

There then seems to be a problem with mounted files being available (which is weird as the file doing the including is

Tue Mar 14 10:23:46 GMT 2023: nslookup check is Server:    208.67.222.222
Address 1: 208.67.222.222
Tue Mar 14 10:23:46 GMT 2023: Refreshing calendar
/mnt/base-us/extensions/dashboard/bin/start.sh: source: line 26: can't open '/mnt/base-us/extensions/dashboard/bin/common.sh': No such file or directory

Trying to work out the upstart events that are triggered when available - breadcrumbs here look useful

https://www.timelesssky.com/blog/create-new-partition-on-kindle-touch