Feature/try fixing connection related problem

thematrixdev commented 2 years ago

I have force CURL requesting in IPv4, and output the HTTP-status-code to the log. Since I see PING CURL fails with no content so we don't know what had happened.

I have separated official-zip and custom-zip update, to avoid confusion.

mnakada commented 2 years ago

I understand the curl message.

Both the custom update button and the official update button in the WebUI have the same process. I think it's better to have one because if you separate them, people will think they work differently.

thematrixdev commented 2 years ago

How about changing the wordings on the red button, to make sure which version is being updated?

How about when custom-zip is turned-on and custom-zip-url is not empty, change the red-button wordings to "custom update", if not, the red button is "update" or "official update"?

I am not familiar with VUE, if you think it is good, may you change it for me please?

mnakada commented 2 years ago

OK. I will merge and fix the WebUI.

thematrixdev commented 2 years ago

Applied all the fixes we have done so far, my cameras still sometimes disconnect.

May I know why you put wifi-restart in this hack? I think the camera OS itself should have handled this? I am afraid doing a wifi-restart may interfere with the OS one?

mnakada commented 2 years ago

May I know why you put wifi-restart in this hack? I think the camera OS itself should have handled this? I am afraid doing a wifi-restart may interfere with the OS one?

I put in the countermeasure because sometimes when I turned off the router, it would not reconnect. I know that the WiFi reconnect process works inside iCamera_app as well. However, I think it will eventually reconnect even if both work.

thematrixdev commented 2 years ago

I have tried modifying the 2.4Ghz wifi settings in my home. Turned bluetooth-coexistence to preemptive, changing channel-width from 40mhz to 20mhz, the camera still sometimes disconnects and re-connects. May you share your wifi settings?

mnakada commented 2 years ago

There are no special settings. I am using tp-link's AX5400 default settings as they are. The channel bandwidth is 20 MHz.

thematrixdev commented 2 years ago

Just within last 10 minutes, the health-check requests are time-out.

[root@atom-entrance:mmc]# tail -f healthcheck.log 
2022/04/12 23:38:00 : 200
2022/04/12 23:39:00 : 200
2022/04/12 23:40:00 : 000
2022/04/12 23:41:00 : 000
2022/04/12 23:42:00 : 000
2022/04/12 23:43:00 : 000
2022/04/12 23:44:00 : 000
2022/04/12 23:45:00 : 000
2022/04/12 23:46:00 : 000
2022/04/12 23:47:00 : 000
2022/04/12 23:48:00 : 000
2022/04/12 23:49:00 : 000
2022/04/12 23:50:00 : 000

Just running this would succeed.

[root@atom-entrance:mmc]# curl https://hc-ping.com/xxxxxxxxxx

Internet access is there.

[root@atom-entrance:~]# ip route
default via 192.168.50.1 dev wlan0 
192.168.50.0/24 dev wlan0  src 192.168.50.61 
[root@atom-entrance:~]# ping 192.168.50.1
PING 192.168.50.1 (192.168.50.1): 56 data bytes
64 bytes from 192.168.50.1: seq=0 ttl=64 time=1.482 ms
64 bytes from 192.168.50.1: seq=1 ttl=64 time=5.853 ms
64 bytes from 192.168.50.1: seq=2 ttl=64 time=1.448 ms
64 bytes from 192.168.50.1: seq=3 ttl=64 time=6.080 ms
^C
--- 192.168.50.1 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 1.448/3.715/6.080 ms

Is the CURL used in healthcheck.sh the same with the one used in SSH terminal? So strange the behaviour is different.

mnakada commented 2 years ago

Is the CURL used in healthcheck.sh the same with the one used in SSH terminal? So strange the behaviour is different.

It should be the same. Does the server side receive the packets?

thematrixdev commented 2 years ago

No. Server side (healthchecks.io) does not receive the PING for more than 2 minutes so it alarmed.

[root@atom-entrance:mmc]# traceroute hc-ping.com
traceroute to hc-ping.com (159.69.66.229), 30 hops max, 38 byte packets
 1  192.168.50.1 (192.168.50.1)  0.748 ms  0.580 ms  0.813 ms
 2  *  *  *
 3  10.30.54.121 (10.30.54.121)  5.466 ms  5.648 ms  0.845 ms
 4  10.30.28.81 (10.30.28.81)  5.771 ms  7.768 ms  2.169 ms
 5  10.28.21.25 (10.28.21.25)  19.064 ms  35.460 ms  2.437 ms
 6  218.188.28.73 (218.188.28.73)  8.712 ms  3.695 ms  3.302 ms
 7  *  *  *
 8  ix-ae-15-0.tcore1.hk2-hongkong.as6453.net (116.0.67.65)  17.836 ms  7.189 ms  6.427 ms
 9  *traceroute: sendto: Network is unreachable

[root@atom-entrance:mmc]# traceroute hc-ping.com
traceroute to hc-ping.com (178.63.26.145), 30 hops max, 38 byte packets
 1  192.168.50.1 (192.168.50.1)  0.058 ms  0.035 ms  0.498 ms
 2  *  *traceroute: sendto: Network is unreachable

[root@atom-entrance:mmc]# traceroute hc-ping.com
traceroute to hc-ping.com (159.69.66.229), 30 hops max, 38 byte packets
 1traceroute: sendto: Network is unreachable

[root@atom-entrance:mmc]# traceroute hc-ping.com
traceroute to hc-ping.com (159.69.66.229), 30 hops max, 38 byte packets
 1traceroute: sendto: Network is unreachable

[root@atom-entrance:mmc]# traceroute hc-ping.com
traceroute to hc-ping.com (159.69.66.229), 30 hops max, 38 byte packets
 1  192.168.50.1 (192.168.50.1)  1.061 ms  0.031 ms  0.506 ms
 2  *  *  *
 3  10.30.54.121 (10.30.54.121)  21.266 ms  19.596 ms  4.654 ms
 4  10.30.29.81 (10.30.29.81)  1.413 ms  3.487 ms  2.976 ms
 5  10.28.21.29 (10.28.21.29)  6.294 ms  11.143 ms  2.441 ms
 6  218.188.28.66 (218.188.28.66)  4.037 ms  5.604 ms  2.888 ms
 7  *  *  *
 8  ix-ae-15-0.tcore1.hk2-hongkong.as6453.net (116.0.67.65)  26.745 ms  5.285 ms  4.949 ms
 9  if-ae-37-8.tcore2.hk2-hongkong.as6453.net (116.0.93.138)  196.399 ms  193.114 ms  if-ae-7-2.thar1.hk2-hongkong.as6453.net (180.87.112.142)  193.684 ms
10  if-ae-32-6.tcore2.svw-singapore.as6453.net (116.0.93.153)  193.276 ms  *  *
11  if-ae-2-2.tcore1.svw-singapore.as6453.net (180.87.12.1)  195.229 ms  *traceroute: sendto: Network is unreachable

by the way, I see /tmp/resolv.conf is created in atom_init.sh. Something like network-manager will update it? Or the hack will update it in somewhere? I wonder where the default 8.8.8.8 come from.

mnakada commented 2 years ago

by the way, I see /tmp/resolv.conf is created in atom_init.sh. Something like network-manager will update it? Or the hack will update it in somewhere? I wonder where the default 8.8.8.8 come from.

This setting is made by iCamera_app.

thematrixdev commented 2 years ago

I cannot find the 8.8.8.8 https://gist.github.com/bakueikozo/b345884e3683e6399949f267a6ab4b3f

Just found that it keeps reconnecting wifi today. https://pastebin.com/u5wsgubv Even now it keeps reconnecting. However, I am connected via SSH and tailing the atomhack.log

mnakada commented 2 years ago

less /tmp/log/atom.log

[exec-iCame,0468](no.100000) cmd:[tf_prepare --blkdev /dev/mmcblk0p1 --strategy=
0 --samplecnt=32]
[exec-iCame,0428](no.100001) cmd:[wpa_cli -p /var/run/wpa_supplicant -i wlan0 ST
ATUS | grep ip_address]
[exec-iCame,0433](no.100001) msgque ret:[ip_address=192.168.0.36]
[netServ.c,1256]dbg: wifi dhcp ok...
[exec-iCame,0428](no.100001) cmd:[echo "nameserver 8.8.8.8" >> /etc/resolv.conf]

[exec-iCame,0437](no.100001) msgque ret:[0]
[exec-iCame,0428](no.100001) cmd:[echo "nameserver 8.8.4.4" >> /etc/resolv.conf]

[exec-iCame,0437](no.100001) msgque ret:[0]
[init.c,0564]Current network dhcp ok...
[netServ.c,1586]dbg: gateway: 192.168.0.1
[netServ.c,1587]dbg: ip addr: 192.168.0.36

It is present in the output of iCamera_app.

The camera can access up to the LAN, but does not appear to be routing beyond that point.

thematrixdev commented 2 years ago

Do you see something like "Preamble Type" on your router settings? There are two options: long and short. Which one do you use?

mnakada commented 2 years ago

Unfortunately, this router is not configurable in detail.

mnakada commented 2 years ago

I don't think WiFi is the cause because the LAN was connected between the LAN and AtomCam. Please let me know the log of the following execution on AtomCam when the problem occurs.

# /scripts/health_check.sh

# route -ne

# ping 10.30.54.121

# ping hc-ping.com

# nslookup hc-ping

thematrixdev commented 2 years ago

I have 3 cameras "NO PING" for hours. I can still SSH, using the web-ui and also atom-app.

This is one of them: ping hc-ping.com works once, but several minutes after entering the command. In the video it does not work. nslookup was very slow to show each IP. I have run it once before taking this video, so it is fast in the video. https://www.youtube.com/watch?v=_acSfnJTX78

traceroute when it is normal

[root@atom-desk:~]# traceroute hc-ping.com
traceroute to hc-ping.com (159.69.66.229), 30 hops max, 38 byte packets
 1  RT-AX3000 (192.168.50.1)  3.048 ms  6.859 ms  6.118 ms
 2  *  *  *
 3  10.30.54.121 (10.30.54.121)  53.665 ms  31.691 ms  24.890 ms
 4  10.30.28.81 (10.30.28.81)  19.179 ms  2.895 ms  11.047 ms
 5  10.28.21.25 (10.28.21.25)  9.538 ms  3.722 ms  3.048 ms
 6  218.188.28.73 (218.188.28.73)  9.408 ms  4.460 ms  7.953 ms
 7  *  *  *
 8  ix-ae-15-0.tcore1.hk2-hongkong.as6453.net (116.0.67.65)  15.365 ms  9.647 ms  11.484 ms
 9  if-ae-37-4.tcore2.hk2-hongkong.as6453.net (116.0.93.146)  199.663 ms  if-ae-37-6.tcore2.hk2-hongkong.as6453.net (116.0.93.136)  195.120 ms  if-ae-37-4.tcore2.hk2-hongkong.as6453.net (116.0.93.146)  194.484 ms
10  *  *  *
11  *  if-ae-2-2.tcore1.svw-singapore.as6453.net (180.87.12.1)  198.912 ms  *
12  *  *  *
13  *  if-ae-2-4.tcore2.wyn-marseille.as6453.net (80.231.217.53)  198.559 ms  *
14  if-be-42-2.ecore2.emrs2-marseille.as6453.net (80.231.200.17)  223.144 ms  if-ae-2-3.tcore2.wyn-marseille.as6453.net (80.231.217.51)  196.802 ms  if-be-42-2.ecore2.emrs2-marseille.as6453.net (80.231.200.17)  197.237 ms
15  if-ae-50-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.214)  192.723 ms  203.318 ms  195.803 ms
16  *  *  *
17  static.229.66.69.159.clients.your-server.de (159.69.66.229)  200.881 ms  *  201.278 ms

traceroute just now

[root@atom-desk:~]# traceroute hc-ping.com
traceroute to hc-ping.com (159.69.66.229), 30 hops max, 38 byte packets
 1  192.168.50.1 (192.168.50.1)  4.982 ms  *  *
 2  *  *  *
 3  *  *  *
 4  *  *  *
 5  *  *  *
 6  *  *  *
 7  *  *  *
 8  ix-ae-15-0.tcore1.hk2-hongkong.as6453.net (116.0.67.65)  13.643 ms  *  *
 9  *  *  *
10  *  *  *
11  *  *  *
12  *  *  *
13  *  *  *
14  *  *  *
15  *  *  if-ae-50-2.tcore2.fnm-frankfurt.as6453.net (195.219.87.214)  199.854 ms
16  if-ae-4-2.tcore1.fr0-frankfurt.as6453.net (195.219.87.18)  209.489 ms  *  *
17  *  *  *
18  *  *  *
19  *  *  *
20  *  *  *
21  *  *  *
22  *  *  *
23  *  *  *
24  *  *  *
25  *  *  *
26  *  *  *
27  *  *  *
28  *  *  *
29  *  *  *
30  *  *  *

mnakada commented 2 years ago

I thought the DNS might not be connected, but that doesn't seem to be the case. In the meantime, could you please see if there is a time difference when you run the following command?

# ping hc-ping.com

# ping 159.69.66.229

thematrixdev commented 2 years ago

I thought the DNS might not be connected, but that doesn't seem to be the case. In the meantime, could you please see if there is a time difference when you run the following command?
# ping hc-ping.com

# ping 159.69.66.229

Yes. ping-ing the IP address response instantly. Ping-ing the domain name looks no response. May need to wait for minutes but I CTRL+C. nslookup is slow. Nslook is using the default 8.8.8.8

mnakada commented 2 years ago

It is very slow even from my AtomCam. Only 'nslookup hc-ping.com' is slow. 'nslookup google.com' responds immediately.

thematrixdev commented 2 years ago

Oh really. I am just trying to route the traffic to a VPN in Japan to see if this is ISP routing problem. 😂 Ping-ing from computer is normal, right?

mnakada commented 2 years ago

Yes. From a PC, I see that nslookup hc-ping.com also responds immediately.

thematrixdev commented 2 years ago

After rebooting the camera, it acts normally. And it uses my own DNS.

[root@atom-desk:~]# nslookup hc-ping.com
Server:    192.168.50.2
Address 1: 192.168.50.2 telephone-booth

Name:      hc-ping.com
Address 1: 159.69.66.229 static.229.66.69.159.clients.your-server.de
Address 2: 178.63.26.145 static.145.26.63.178.clients.your-server.de
Address 3: 176.9.71.146 static.146.71.9.176.clients.your-server.de
Address 4: 188.40.122.95 static.95.122.40.188.clients.your-server.de
Address 5: 2a01:4f8:141:4258::2
Address 6: 2a01:4f8:231:1214::2
Address 7: 2a01:4f8:151:18c::2
Address 8: 2a01:4f8:221:2d1b::2

It looks like the problem happens a period of time after boot. I think we can conclude it is not our ISP, router, DNS and wifi problem?

mnakada commented 2 years ago

Run the following on the camera that is having the problem.

# killall udhcpc
# udhcpc -i wlan0 -p /var/run/udhcpc.pid
# nslookup hc-ping.com

mnakada commented 2 years ago

If a reboot of udhcpc doesn't bring back the response time, we may need to add a reboot.

thematrixdev commented 2 years ago

If a reboot of udhcpc doesn't bring back the response time, we may need to add a reboot.

I have tried running wifi-reset code in watchdog.sh and it brought back the Internet. If we add a wifi-reset, we need to move watchdog to cron as well, or both of them may turn wifi on and off at the same time.

It seems I am the only one having this problem? Maybe I should try finding out the real problem.

By the way, I think I have found out the reason why wifi disconnects. I think it is because of the two 2.4Ghz antenna (wifi and bluetooth) and the bluetooth (LDAC) speaker very near to that particular camera (atom-desk). So the problem left now is Internet connectivity.

mnakada commented 2 years ago

I have tried running wifi-reset code in watchdog.sh and it brought back the Internet. If we add a wifi-reset, we need to move watchdog to cron as well, or both of them may turn wifi on and off at the same time.

I guess we can countermeasure by adding a reset process for the interface. I would have no problem with a double reset process. How about the following modification?

health_check.sh

#!/bin/sh

HACK_INI=/tmp/hack.ini
HEALTHCHECK_PING_URL=$(awk -F "=" '/HEALTHCHECK_PING_URL *=/ {print $2}' $HACK_INI)
HTTPCODE=200
if [ "$HEALTHCHECK_PING_URL" != "" ] ; then
  HTTPCODE=`curl --ipv4 --max-time 10 --retry 5 --location --silent --show-error --output /dev/null --write-out "%{http_code}" $HEALTHCHECK_PING_URL`
  echo $(TZ=JST-9 date +"%Y/%m/%d %H:%M:%S : ") $HTTPCODE  >> /media/mmc/healthcheck.log

  if[ $HTTPCODE -ne 200 ]; then
    echo $(TZ=JST-9 date +"%Y/%m/%d %H:%M:%S : WiFi restart(health_check)") >> /media/mmc/atomhack.log
    ifconfig wlan0 down
    ifconfig wlan0 up
    killall -USR1 udhcpc || udhcpc -i wlan0 -p /var/run/udhcpc.pid
  fi
fi

thematrixdev commented 2 years ago

I think we'd better check the HTTP Code to be 000 (timeout) or something related. Those 3xx, 4xx, 5xx should not trigger a reset. And maybe it'd better to allow http-error several times before reset.

Thank you very much!

mnakada commented 2 years ago

Let me think about it for a day. Multiple error checks and other conditions can lead to multiple runs by taking a long time. I will consider a good way to do this in conjunction with pinging watchdog.sh.

thematrixdev commented 2 years ago

It seems any code other than 000 should be valid. https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#information_responses Hence only 000 should trigger the wifi reset.

anyway thanks. i am trying to figure out the reason of internet outage as well.

thematrixdev commented 2 years ago

I have just thought, I have hc-ping on all my computers in the same network, so it should not be ISP or my router problem.

You said you have noticed a slow down on ping-ing hc-ping.com on the camera, right? May you try getting an account on healthchecks.io, set the schedule period and grace time to both 1 minute, and put the ping url onto the camera health-check function? I want to make sure it is not related to ISP or router problem.

And it seems you can connect to the camera via serial port? Without running the hack, it is possible to run curl by crontab, so to figure out if it is camera hardware problem?

Thank you very much for your help!

mnakada commented 2 years ago

I have committed the health_check related fixes.

You said you have noticed a slow down on ping-ing hc-ping.com on the camera, right?

No, I said slowed down, in the case of nslookup hc-ping.com.

May you try getting an account on healthchecks.io, set the schedule period and grace time to both 1 minute, and put the ping url onto the camera health-check function? I want to make sure it is not related to ISP or router problem.

I registered AtomCam2 and AtomSwing today at 13:00 JST and have not yet encountered any errors. Both the cycle and grace time are set to 1 minute.

And it seems you can connect to the camera via serial port? Without running the hack, it is possible to run curl by crontab, so to figure out if it is camera hardware problem?

The reason I am connecting serial is to hack the boot and kernel. It can be used for debugging when WiFi is disconnected, though.

thematrixdev commented 2 years ago

Oh I have missed the email. Let me try using the fixes today. Thanks.

thematrixdev commented 2 years ago

Seems to be pretty stable in last 24 hours ❤️

mnakada commented 2 years ago

There were so many logs in iCamera_app that after a few days, the space used in /tmp was at 100%. Therefore, retry count in health_check.sh could not be written. Ver. 1.2.20 addresses this issue.

thematrixdev commented 2 years ago

There were so many logs in iCamera_app that after a few days, the space used in /tmp was at 100%. Therefore, retry count in health_check.sh could not be written. Ver. 1.2.20 addresses this issue.

My cameras are pretty stable these days, though. No idea why.

Thanks.

mnakada / atomcam_tools

Feature/try fixing connection related problem #15