Open taldcroft opened 10 years ago
I wonder if this multiply try approach would be more or less valuable than having the cmd_states code fetch and check the remote h5 file on every run (or once a day, or something reasonable).
It's not possible to "check the remote h5 file" via lucky. Just pushing the cmd_states.h5 file (at a whopping 13 Mb) every day would be reasonable though.
It's not possible to "check the remote h5 file" via lucky.
I meant fetch and compare, so I don't know why that wouldn't be possible except that the code doesn't exist.
I still don't understand precisely what you mean. Can you use words like HEAD, ftp, lucky, put, get, and OCC to describe what you are imagining?
Sure. I meant, from HEAD, get cmd_states.h5 from lucky, compare to SKA/HEAD cmd_states.h5, and on absence or difference put SKA/HEAD cmd_states.h5 on lucky.
I suppose this would be easier if there were an md5 or sha for the cmd_states.h5 in an accompanying checksum file (though that can cause its own problems).
Ah, that's where we weren't on the same page. When the OCC process gets the file from lucky it is also deleted from lucky. This is the poor man's way of communicating to HEAD (and me) that the new file was successfully transferred to OCC.
https://github.com/sot/kadi/blob/master/kadi/occweb.py#L142
Hopefully this "try harder" approach will work to reduce situations like the one we're in.
Ah. Thanks! I think you'd mentioned that in one of our meetings, but I hadn't stored it. Might makes sense to leave a comment someplace like:
to make it clear that the file doesn't actually stay where it is put!
Update
ftp_put_to_lucky
andftp_get_from_lucky
to put file operations within a context manager to bail out on exception, close ftp connection. Repeat up ton_tries
times with a wait time that doubles starting from 1 minute, wheren_tries
is a new keyword arg that defaults to 3.The goal is to reduce errors like below:
@jeanconn