marcone / teslausb

A smart USB drive for Tesla Dashcam - extended storage, auto archive, web viewer
MIT License
1.92k stars 354 forks source link

Pi fails to respond to SSH when copying music from CIFS share #263

Open fatfurrycat opened 5 years ago

fatfurrycat commented 5 years ago

I've just setup teslausb using the latest image and config script. I've attached my config script.

Upon initial reboot following install, I am able to SSH to the Pi. I was also able to change password for Pi user. The /boot directory contained TESLAUSB_SETUP_FINISHED and WIFI_ENABLED files.

However, I was unable to see the Pi as a USB drive, so I issued a reboot from the command line.

When the Pi had rebooted, I can see from my network monitor that it is transferring music files from my CIFS share on my Synology NAS, which I'm assuming is why I can't see the device as a USB drive on my PC.

However, I'm also no longer able to SSH into the Pi to check on progress of the copy. I've tried to SSH from Fedora, Cygwin and Putty. All tools simply hang and no login prompt is presented. I also see no timeouts.

Debug output from ssh client: $ ssh -v pi@192.168.0.216 OpenSSH_8.0p1, OpenSSL 1.1.1c FIPS 28 May 2019 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config debug1: configuration requests final Match pass debug1: re-parsing configuration debug1: Reading configuration data /etc/ssh/ssh_config debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config debug1: Connecting to 192.168.0.216 [192.168.0.216] port 22. debug1: Connection established. debug1: identity file /home/teslausb/.ssh/id_rsa type 0 debug1: identity file /home/teslausb/.ssh/id_rsa-cert type -1 debug1: identity file /home/teslausb/.ssh/id_dsa type -1 debug1: identity file /home/teslausb/.ssh/id_dsa-cert type -1 debug1: identity file /home/teslausb/.ssh/id_ecdsa type -1 debug1: identity file /home/teslausb/.ssh/id_ecdsa-cert type -1 debug1: identity file /home/teslausb/.ssh/id_ed25519 type -1 debug1: identity file /home/teslausb/.ssh/id_ed25519-cert type -1 debug1: identity file /home/teslausb/.ssh/id_xmss type -1 debug1: identity file /home/teslausb/.ssh/id_xmss-cert type -1 debug1: Local version string SSH-2.0-OpenSSH_8.0

If I telnet to port 22, then I see: telnet> open teslausb.local 22 Trying 192.168.0.216... Connected to teslausb.local. Escape character is '^]'.

I get no further response.

I can get a response from the Pi using ping, but that's about it: $ ping 192.168.0.216 PING 192.168.0.216 (192.168.0.216) 56(84) bytes of data. 64 bytes from 192.168.0.216: icmp_seq=7 ttl=64 time=4.92 ms 64 bytes from 192.168.0.216: icmp_seq=8 ttl=64 time=5.13 ms 64 bytes from 192.168.0.216: icmp_seq=9 ttl=64 time=5.06 ms 64 bytes from 192.168.0.216: icmp_seq=10 ttl=64 time=5.14 ms

If I try to connect to the 'TESLAUSB WIFI' AP, then I get the following in the system journal:

Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.4470] device (wlp4s0): Activation: starting connection 'TESLAUSB WIFI' (2291cc3c-cf7e-47b5-bd8a-575ec391aaa2) Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.4507] sup-iface[0x55f30479e0e0,wlp4s0]: connection disconnected (reason -3) Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.4508] device (wlp4s0): supplicant interface state: completed -> disconnected Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.4508] device (p2p-dev-wlp4s0): supplicant management interface state: completed -> disconnected Sep 17 15:18:27 lenovo-p50 systemd[1]: Starting Network Manager Script Dispatcher Service... Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.4581] device (wlp4s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Sep 17 15:18:27 lenovo-p50 systemd[1]: Started Network Manager Script Dispatcher Service. Sep 17 15:18:27 lenovo-p50 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr> Sep 17 15:18:27 lenovo-p50 nm-dispatcher[15044]: req:1 'down' [wlp4s0]: new request (3 scripts) Sep 17 15:18:27 lenovo-p50 nm-dispatcher[15044]: req:1 'down' [wlp4s0]: start running ordered scripts... Sep 17 15:18:27 lenovo-p50 wpa_supplicant[2148]: wlp4s0: CTRL-EVENT-REGDOM-CHANGE init=DRIVER type=WORLD Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5679] device (wlp4s0): set-hw-addr: reset MAC address to 7C:76:35:E9:2A:91 (preserve) Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5756] device (wlp4s0): supplicant interface state: disconnected -> disabled Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5756] device (p2p-dev-wlp4s0): supplicant management interface state: disconnected -> disabled Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5762] device (wlp4s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5775] device (wlp4s0): Activation: (wifi) access point 'TESLAUSB WIFI' has security, but secrets are required. Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5776] device (wlp4s0): state change: config -> need-auth (reason 'none', sys-iface-state: 'managed') Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5973] device (wlp4s0): supplicant interface state: disabled -> disconnected Sep 17 15:18:27 lenovo-p50 NetworkManager[1960]: [1568729907.5974] device (p2p-dev-wlp4s0): supplicant management interface state: disabled -> disconnected

and in dmesg, I see:

[1751658.370892] wlp4s0: deauthenticating from 2c:56:dc:da:6e:64 by local choice (Reason: 3=DEAUTH_LEAVING) [1751659.667021] wlp4s0: authenticate with b8:27:eb:a1:62:4e [1751659.675268] wlp4s0: send auth to b8:27:eb:a1:62:4e (try 1/3) [1751659.682742] wlp4s0: authenticated [1751659.683134] iwlwifi 0000:04:00.0 wlp4s0: disabling HT as WMM/QoS is not supported by the AP [1751659.683144] iwlwifi 0000:04:00.0 wlp4s0: disabling VHT as WMM/QoS is not supported by the AP [1751659.684508] wlp4s0: associate with b8:27:eb:a1:62:4e (try 1/3) [1751659.698658] wlp4s0: RX AssocResp from b8:27:eb:a1:62:4e (capab=0x411 status=0 aid=1) [1751659.703841] wlp4s0: associated [1751659.722097] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by b8:27:eb:a1:62:4e [1751669.704522] wlp4s0: deauthenticating from b8:27:eb:a1:62:4e by local choice (Reason: 3=DEAUTH_LEAVING) [1751671.041099] wlp4s0: authenticate with b8:27:eb:a1:62:4e [1751671.049531] wlp4s0: send auth to b8:27:eb:a1:62:4e (try 1/3) [1751671.056143] wlp4s0: authenticated [1751671.056498] iwlwifi 0000:04:00.0 wlp4s0: disabling HT as WMM/QoS is not supported by the AP [1751671.056508] iwlwifi 0000:04:00.0 wlp4s0: disabling VHT as WMM/QoS is not supported by the AP [1751671.057503] wlp4s0: associate with b8:27:eb:a1:62:4e (try 1/3) [1751671.071512] wlp4s0: RX AssocResp from b8:27:eb:a1:62:4e (capab=0x411 status=0 aid=1) [1751671.073930] wlp4s0: associated [1751671.088389] wlp4s0: Limiting TX power to 30 (30 - 0) dBm as advertised by b8:27:eb:a1:62:4e [1751681.075014] wlp4s0: deauthenticating from b8:27:eb:a1:62:4e by local choice (Reason: 3=DEAUTH_LEAVING)

The connection to the AP appears to timeout, and presents me with a prompt for the security key.

I'm just wondering if this is normal behaviour when copying music?

teslausb_setup_variables.conf.txt

marcone commented 5 years ago

I've seen the nonresponsive ssh before, and always attributed it to slow wifi being saturated by archiving, because it always started working again after archiving finished. Haven't noticed it during music copying though.

fatfurrycat commented 5 years ago

Thanks for the quick response, marcone.

I'm transferring about 123GB of Music to the Pi, and figured it might be quicker over the CIFS connection that copying via USB. Maybe, that was a mistake. It's still running, so I'll leave it going for now.

marcone commented 5 years ago

Yeah that's going to take a looong time. You should seriously ask yourself whether you really need that much music in you car. If you the answer is yes, I would recommend using a separate dedicated usb drive for that, or even a small ssd.

marcone commented 5 years ago

Twice now I've also seen the network going up and down repeatedly, resulting in a very sluggish connection, to the point of unusability. It looks like this in dmesg:

[18721.398246] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[18731.144154] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[18731.144198] brcmfmac: power management disabled
[18733.828410] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[18739.944577] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[18739.946434] brcmfmac: power management disabled

etcetera, over and over until I reboot the Pi, and then it's fine. I have no idea what causes this.

fatfurrycat commented 5 years ago

I'm beginning to think you're right on the music front. I think I'll turn syncing off and try to find another way.

I'm guessing I could mount the partition and the .bin file on a Linux system and copy the files using a card reader?

Any thoughts on what might work?

fatfurrycat commented 5 years ago

So, I've put the SD card into my Fedora laptop, and done the following:

$ mkdir /mnt/teslausb $ mount /dev/mmcblk0p3 /mnt/teslausb $ file music_disk.bin music_disk.bin: DOS/MBR boot sector; partition 1 : ID=0xc, start-CHS (0x0,32,33), end-CHS (0x145,85,16), startsector 2048, 268433408 sectors, extended partition table (last)

which gives me the partition start sector. To calculate the offset for losetup, you need to multiply that number by 512, so the offset is 2048*512=1048576

$ losetup This will show what loopback filesystems you've currently got (I've currently got 14). Pick a loopback number that's unused. In my case, I just arbitrarily chose 20.

$ losetup -o 1048576 loop20 /backingfiles/music_disk.bin $ mkdir /mnt/music $ mount /dev/loop20 /mnt/music

I'm now copying my music into the mounted /mnt/music filesystem at around 25MB/s. A huge improvement.

marcone commented 5 years ago

If you also mount the /root partition of the sd card, you can look up the offset in the Pi's fstab, and possibly even do something like: mount --fstab /media/fatfurrycat/root/etc/fstab /mnt/music (untested)

curiousgrge commented 5 years ago

I've redone my teslausb multiple times now and here are some other tips if you still plan on using music sync.

I moved my music out of the share into a temporary folder outside the share to avoid it syncing the first time.

Connect to PC via USB by using TWO USB cables to supply power as one cable wasn't enough to bring up the partitions.

Try this to speed up your USB copying https://github.com/marcone/teslausb/issues/239#issuecomment-529830171

sTevefor3 commented 5 years ago

Twice now I've also seen the network going up and down repeatedly, resulting in a very sluggish connection, to the point of unusability. It looks like this in dmesg:

[18721.398246] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[18731.144154] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[18731.144198] brcmfmac: power management disabled
[18733.828410] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[18739.944577] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[18739.946434] brcmfmac: power management disabled

etcetera, over and over until I reboot the Pi, and then it's fine. I have no idea what causes this.

I've seen this same behaviour, although rarely. In addition, it appears that the Pi grabs 2 IP addresses from my home network DHCP server. A reboot fixed it, but it has appeared again.

The issue with data transfer stalling out ssh attempts is also present in my setup. Doesn't seem to interfere with normal non-data transferring operations (i.e., recording cam video & music playback), but it does prevent me from logging-in via ssh until the transfer operation is completed. No biggee, I guess...

AngusThermo-Pyle commented 5 years ago

I noticed this issue as well recently when updating my OS and recopying music. It would leave existing sessions open, respond to new ones, but not respond to the session at all until the copy was done. Not much of an issue for my use case, but odd.

sboesebeck commented 5 years ago

my guess is, this is not only happening when syncing music - i have the same issue, that SSH fails to respond sometimes. But I do not have any music that is to be transferred.

Could it be, that the transfer "eats up" all bandwith? Maybe we could fix it using some QOS-Rules in network/firewall settings?

if that is really a thing that needs fixing. Usually I do not connect to the PI if all works fine :-)

githubsean commented 4 years ago

Somewhat related, I'm finding that while transferring clips onto my NAS, the connection checker is killing the archive process. This may be because, as mentioned above, the network is saturated and nothing is getting through.

Another reason I have thought of is that the ping (using hping3) isn't actually properly closing the connection - all it does is a SYN packet and waits for the SYN/ACK packet from the NAS. This leaves the connection state open until it times out - effectively a SYN flood. I haven't checked this yet though. I'm going to try changing the connection checker to use some different methods - maybe nc or nmap - to see if that makes a difference in my situation. I'll also see if there are any "security" settings on the NAS (Synology) which might be causing this.

stauffenberg2020 commented 5 months ago

UPDATE : I confirm that the below work around doesnt work after a couple of tests

May be a late comment. I have seen this network throttling issues and disconnections especially while testing how music copy works. If you have a (Synology) NAS, You can easily set the upload and download rates for the NAS user. This probably can get rid of network disconnections. (I tested once and it seems to work...Need to test a couple of more times)