sot / kadi

Chandra commands and events
https://sot.github.io/kadi
BSD 3-Clause "New" or "Revised" License
5 stars 3 forks source link

Allow for multiple tries in ftp operations #44

Open taldcroft opened 10 years ago

taldcroft commented 10 years ago

Update ftp_put_to_lucky and ftp_get_from_lucky to put file operations within a context manager to bail out on exception, close ftp connection. Repeat up to n_tries times with a wait time that doubles starting from 1 minute, where n_tries is a new keyword arg that defaults to 3.

The goal is to reduce errors like below:

Ska.ftp: log in to lucky as taldcroft
Ska.ftp: cd /home/taldcroft
Ska.ftp: ls .
Ska.ftp: put cmd_states.h5 as e7ffd3dc-53d8-4435-9b2c-dae1da4ddd57
Ska.ftp: rename e7ffd3dc-53d8-4435-9b2c-dae1da4ddd57 cmd_states/cmd_states.h5
Traceback (most recent call last):
  File "/proj/sot/ska/share/cmd_states/update_cmd_states.py", line 9, in <module>
    update_cmd_states.main()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/Chandra.cmd_states-0.09-py2.7.egg/Chandra/cmd_states/update_cmd_states.py", line 463, in main
    occweb.ftp_put_to_lucky(ftp_dirname, [opt.h5file], logger=logging)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/kadi-0.8-py2.7.egg/kadi/occweb.py", line 118, in ftp_put_to_lucky
    ftp.rename(ftp_file, '{}/{}'.format(ftp_dirname, file_base))
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/Ska.ftp-0.04-py2.7.egg/Ska/ftp.py", line 157, in rename
    self.ftp.rename(oldpath, newpath)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/paramiko-1.12.0-py2.7.egg/paramiko/sftp_client.py", line 286, in rename
    self._request(CMD_RENAME, oldpath, newpath)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/paramiko-1.12.0-py2.7.egg/paramiko/sftp_client.py", line 689, in _request
    return self._read_response(num)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/paramiko-1.12.0-py2.7.egg/paramiko/sftp_client.py", line 736, in _read_response
    self._convert_status(msg)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-5/lib/python2.7/site-packages/paramiko-1.12.0-py2.7.egg/paramiko/sftp_client.py", line 766, in _convert_status
    raise IOError(text)
IOError: Failure
Closing remaining open files: /proj/sot/ska/data/cmd_states/cmd_states.h5... done
[chan 1] sftp session closed.

@jeanconn

jeanconn commented 10 years ago

I wonder if this multiply try approach would be more or less valuable than having the cmd_states code fetch and check the remote h5 file on every run (or once a day, or something reasonable).

taldcroft commented 10 years ago

It's not possible to "check the remote h5 file" via lucky. Just pushing the cmd_states.h5 file (at a whopping 13 Mb) every day would be reasonable though.

jeanconn commented 10 years ago

It's not possible to "check the remote h5 file" via lucky.

I meant fetch and compare, so I don't know why that wouldn't be possible except that the code doesn't exist.

taldcroft commented 10 years ago

I still don't understand precisely what you mean. Can you use words like HEAD, ftp, lucky, put, get, and OCC to describe what you are imagining?

jeanconn commented 10 years ago

Sure. I meant, from HEAD, get cmd_states.h5 from lucky, compare to SKA/HEAD cmd_states.h5, and on absence or difference put SKA/HEAD cmd_states.h5 on lucky.

I suppose this would be easier if there were an md5 or sha for the cmd_states.h5 in an accompanying checksum file (though that can cause its own problems).

taldcroft commented 10 years ago

Ah, that's where we weren't on the same page. When the OCC process gets the file from lucky it is also deleted from lucky. This is the poor man's way of communicating to HEAD (and me) that the new file was successfully transferred to OCC.

https://github.com/sot/kadi/blob/master/kadi/occweb.py#L142

Hopefully this "try harder" approach will work to reduce situations like the one we're in.

jeanconn commented 10 years ago

Ah. Thanks! I think you'd mentioned that in one of our meetings, but I hadn't stored it. Might makes sense to leave a comment someplace like:

https://github.com/sot/cmd_states/blob/31c391bd1445add2c12728ef6873f7df0b446ccb/Chandra/cmd_states/update_cmd_states.py#L463

to make it clear that the file doesn't actually stay where it is put! ​