pyinfra-dev / pyinfra

pyinfra turns Python code into shell commands and runs them on your servers. Execute ad-hoc commands and write declarative operations. Target SSH servers, local machine and Docker containers. Fast and scales from one server to thousands.
https://pyinfra.com
MIT License
3.91k stars 382 forks source link

files.put not updating (on r/w filesystem) #936

Open drewp opened 1 year ago

drewp commented 1 year ago

Kind of like #878, here's another files.put silently failing to update the file. Before:

bang% ls -l /tmp/net_traffic.aarch64
-rwxrwxr-x 1 drewp drewp 8361192 Dec 30 15:22 /tmp/net_traffic.aarch64

pipe% % ls -l /opt/net_traffic/  
total 8132
-rwxr-xr-x 1 root root 8326736 Dec  2 10:11 net_traffic.aarch64

bang% sudo ssh root@pipe touch /opt/net_traffic/not_a_readonly_fs

pipe% ls -l /opt/net_traffic/
total 8132
-rwxr-xr-x 1 root root 8326736 Dec  2 10:11 net_traffic.aarch64
-rw-r--r-- 1 root root       0 Dec 30 15:48 not_a_readonly_fs

deploy.py follows:

from pyinfra.operations import files, systemd

target = '/opt/net_traffic'

files.directory(target)
files.put(src="/tmp/net_traffic.aarch64", dest=target, mode="a+rx")

files.put(src='./net_traffic.service',
          dest='/etc/systemd/system/net_traffic.service')
systemd.service(service='net_traffic.service',
                daemon_reload=True,
                enabled=True,
                running=True,
                restarted=True)

I ran pyinfra -vvv pipe deploy.py.

pyinfra logs:

      6 --> Connecting to hosts...
      7     [pipe] Connected
      8 
      9 --> Preparing Operations...
     10     Loading: deploy.py
     11 [pipe] >>> sh -c '! (test -e /opt/net_traffic || test -L /opt/net_traffic ) || ( stat -c '"'"'user=%U group=%G mode=%A atime=%X mtime=%Y ctime=%Z size=%s %N'"'"' /opt/net_traffic 2> /dev/null || stat -f '"'"'user=%Su group=%Sg mode=%Sp atime=%a mtime=%m ctime=%c s     11 ize=%z %N%SY'"'"' /opt/net_traffic )'
     12 [pipe] user=drewp group=drewp mode=drwxrwxr-x atime=1672441894 mtime=1670004715 ctime=1670004715 size=4096 '/opt/net_traffic'
     13     [pipe] Loaded fact files.Directory (path=/opt/net_traffic)
     14     [pipe] noop: directory /opt/net_traffic already exists
     15 [pipe] >>> sh -c '! (test -e /opt/net_traffic || test -L /opt/net_traffic ) || ( stat -c '"'"'user=%U group=%G mode=%A atime=%X mtime=%Y ctime=%Z size=%s %N'"'"' /opt/net_traffic 2> /dev/null || stat -f '"'"'user=%Su group=%Sg mode=%Sp atime=%a mtime=%m ctime=%c s     15 ize=%z %N%SY'"'"' /opt/net_traffic )'
     16 [pipe] user=drewp group=drewp mode=drwxrwxr-x atime=1672441894 mtime=1670004715 ctime=1670004715 size=4096 '/opt/net_traffic'
     17     [pipe] Loaded fact files.File (path=/opt/net_traffic)
     18 [pipe] >>> sh -c '! (test -e /opt/net_traffic/net_traffic.aarch64 || test -L /opt/net_traffic/net_traffic.aarch64 ) || ( stat -c '"'"'user=%U group=%G mode=%A atime=%X mtime=%Y ctime=%Z size=%s %N'"'"' /opt/net_traffic/net_traffic.aarch64 2> /dev/null || stat -f '     18 "'"'user=%Su group=%Sg mode=%Sp atime=%a mtime=%m ctime=%c size=%z %N%SY'"'"' /opt/net_traffic/net_traffic.aarch64 )'
     19 [pipe] user=root group=root mode=-rwxr-xr-x atime=1672442515 mtime=1670004716 ctime=1672442515 size=8326736 '/opt/net_traffic/net_traffic.aarch64'
     20     [pipe] Loaded fact files.File (path=/opt/net_traffic/net_traffic.aarch64)
     21 [pipe] >>> sh -c '! (test -e /etc/systemd/system/net_traffic.service || test -L /etc/systemd/system/net_traffic.service ) || ( stat -c '"'"'user=%U group=%G mode=%A atime=%X mtime=%Y ctime=%Z size=%s %N'"'"' /etc/systemd/system/net_traffic.service 2> /dev/null ||      21 stat -f '"'"'user=%Su group=%Sg mode=%Sp atime=%a mtime=%m ctime=%c size=%z %N%SY'"'"' /etc/systemd/system/net_traffic.service )'
     22 [pipe] user=root group=root mode=-rw-r--r-- atime=1672438834 mtime=1670004649 ctime=1670004649 size=273 '/etc/systemd/system/net_traffic.service'
     23     [pipe] Loaded fact files.File (path=/etc/systemd/system/net_traffic.service)
     24 [pipe] >>> sh -c '! (test -e /etc/systemd/system || test -L /etc/systemd/system ) || ( stat -c '"'"'user=%U group=%G mode=%A atime=%X mtime=%Y ctime=%Z size=%s %N'"'"' /etc/systemd/system 2> /dev/null || stat -f '"'"'user=%Su group=%Sg mode=%Sp atime=%a mtime=%m c     24 time=%c size=%z %N%SY'"'"' /etc/systemd/system )'
     25 [pipe] user=root group=root mode=drwxr-xr-x atime=1672438834 mtime=1671657241 ctime=1671657241 size=4096 '/etc/systemd/system'
     26     [pipe] Loaded fact files.Directory (path=/etc/systemd/system)
     27 [pipe] >>> sh -c 'test -e /etc/systemd/system/net_traffic.service && ( sha1sum /etc/systemd/system/net_traffic.service 2> /dev/null || shasum /etc/systemd/system/net_traffic.service 2> /dev/null || sha1 /etc/systemd/system/net_traffic.service ) || true'
     28 [pipe] ba56cf7b6203fb99310c395dd490bd86c090090f  /etc/systemd/system/net_traffic.service
     29     [pipe] Loaded fact files.Sha1File (path=/etc/systemd/system/net_traffic.service)
     30     [pipe] noop: file /etc/systemd/system/net_traffic.service is already uploaded
     31 [pipe] >>> sh -c '! command -v systemctl >/dev/null || systemctl show --all --property Id --property SubState '"'"'*'"'"''

(service lines removed)

   1989     [pipe] Loaded fact systemd.SystemdEnabled (machine=None, user_mode=False, user_name=None)
   1990     [pipe] Ready: deploy.py
   1991 
   1992 --> Proposed changes:
   1993     Ungrouped:
   1994     [pipe]   Operations: 4   Change: 2   No change: 2   
   1995 
   1996 
   1997 --> Beginning operation run...
   1998 --> Starting operation: Files/Directory (/opt/net_traffic)
   1999     [pipe] No changes
   2000 
   2001 --> Starting operation: Files/Put (src=/tmp/net_traffic.aarch64, dest=/opt/net_traffic, mode=a+rx, force=True)
   2002     Failed to upload file, retrying: Failure
   2003 [pipe] file uploaded: /opt/net_traffic/net_traffic.aarch64
   2004 [pipe] >>> sh -c 'chmod a+rx /opt/net_traffic/net_traffic.aarch64'
   2005     [pipe] Success
   2006 
   2007 --> Starting operation: Files/Put (src=./net_traffic.service, dest=/etc/systemd/system/net_traffic.service)
   2008     [pipe] No changes
   2009 
   2010 --> Starting operation: Systemd/Service (service=net_traffic.service, daemon_reload=True, enabled=True, running=True, restarted=True)
   2011 [pipe] >>> sh -c 'systemctl daemon-reload'
   2012 [pipe] >>> sh -c 'systemctl restart net_traffic.service'
   2013     [pipe] Success
   2014 
   2015 
   2016 --> Results:
   2017     Ungrouped:
   2018     [pipe]   Changed: 2   No change: 2   Errors: 0   

2003,2005 suggest success, but the new file content has not actually been put:

% ls -l /opt/net_traffic/
total 8132
-rwxr-xr-x 1 root root 8326736 Dec  2 10:11 net_traffic.aarch64
-rw-r--r-- 1 root root       0 Dec 30 15:48 not_a_readonly_fs

Retrying with put(...,force=True) didn't help.

pyinfra 2.6

Fizzadar commented 1 year ago

Thank you for raising this @drewp, concerning given the similar #878 - at this point I'm not sure I trust paramiko's checking of the uploaded size to properly validate an upload. Worst thing is the false positive so I think a solution to that is to calculate a checksum of the file locally and after uploading to compare which should prevent any such issues.

Regarding this issue specifically - does this happen every time or only on some occaisions?

Fizzadar commented 1 year ago

Had a second pass at the retry code and found a pyinfra bug which may well be the cause of both false-positives here! Won't fix any underlying issues but will raise the errors properly. I believe in this case it would now retry correctly and likely succeed (note paramiko does identify the failure "Failed to upload file, retrying: Failure").

Fix: https://github.com/Fizzadar/pyinfra/commit/277cb6fedd7b025940b254817b302ab1149bee76 Released just now in v2.6.1.

drewp commented 1 year ago

With 2.6.1:

--> Starting operation: Files/Put (src=/tmp/net_traffic.aarch64, dest=/opt/net_traffic, mode=a+rx, force=True)
    Failed to upload file, retrying: Failure
    Failed to upload file, retrying: Failure
    Failed to upload file, retrying: Failure
    [pipe] Command socket/SSH error: OSError('Failure',)
    [pipe] Error: executed 0/2 commands
--> pyinfra error: No hosts remaining!

:)

drewp commented 1 year ago
--> Starting operation: Files/Put (src=/tmp/net_traffic.aarch64, dest=/opt/net_traffic, mode=a+rx, force=True)
Traceback (most recent call last):
  File "/stor6/my/proj/infra/env/lib/python3.10/site-packages/pyinfra/connectors/ssh.py", line 480, in _put_file
    sftp.putfo(file_io, remote_location)
  File "/stor6/my/proj/infra/env/lib/python3.10/site-packages/paramiko/sftp_client.py", line 714, in putfo
    with self.file(remotepath, "wb") as fr:
  File "/stor6/my/proj/infra/env/lib/python3.10/site-packages/paramiko/sftp_client.py", line 372, in open
    t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
  File "/stor6/my/proj/infra/env/lib/python3.10/site-packages/paramiko/sftp_client.py", line 822, in _request
    return self._read_response(num)
  File "/stor6/my/proj/infra/env/lib/python3.10/site-packages/paramiko/sftp_client.py", line 874, in _read_response
    self._convert_status(msg)
  File "/stor6/my/proj/infra/env/lib/python3.10/site-packages/paramiko/sftp_client.py", line 907, in _convert_status
    raise IOError(text)

code=4 and text=Failure

Fizzadar commented 1 year ago

Hm, rather annoyingly per this table code 4 is an unknown/generic failure and the text should explain more. It is interesting that the issue seems to be opening the remote file, rather than part of the upload. @drewp would you mind testing a few things here as this seems specific to your setup?:

This should give us a bit more debugging direction I think 🙏!