radiocosmology / alpenhorn

Alpenhorn is a service for managing an archive of scientific data.
MIT License
2 stars 1 forks source link

Rewrite 9/14: I/O util (pull request inner functions) #152

Closed ketiltrout closed 1 year ago

ketiltrout commented 1 year ago

This PR primarily moves parts of the file transfer code out of update.py and into alpenhorn/io/ioutil.py. My original thought with these bits was that they would be useful to other I/O types other than the Default I/O, but that hasn't actually happened in this rewrite. Nevertheless, it's good to split up the unmanageably long update_node_requests.

Updates to previous rewrite PRs

This PR removes my ham-fisted attempt to create an auto-updating CurrentTimestampField (using MySQL's ON UPDATE CURRENT_TIMESTAMP clause) from way back in part 4 (#147) which wasn't going to work for a number of reasons. In it's place is a simple DateTimeField which gets explicitly updated when necessary.

The only potential downside here is that changes made to the ArchiveFileCopy table outside of alpenhorn won't benefit from automatic updates of this cell, but I don't think we're ever updating that table outside of alpenhorn anyways.

File transfer methods

There are three functions handling the various transfer methods moved into ioutil:

These three functions all return an "ioresult" dict with standard-ish keys indicating the result of the transfer.

Post transfer DB update

The final functionality moved to ioutil is a function that essentially takes the ioresult provided by the three transfer functions, determines whether the transfer succeeded or failed, and then and updates the database to reflect what happened. (This is copy_request_done.)

Updates to update.py

Even though it's still commented out, I've updated update.update_node_requests to use these new functions. In the next PR, update_node_requests will be converted to the new I/O system and these changes will be incorporated there.

Changes to util.py

ljgray commented 1 year ago

There are two constants defined in this file, PULL_TIMEOUT_BASE and PULL_BYTES_PER_SECOND which determine the length of time before the transfer is killed. I've indicated that these could be put in the alpenhorn config, but I'm not sure it's worth it. I'm looking for opinions.

Personally I lean towards having them in config, because if we ever end up wanting to change them for some reason it's more of a pain to hunt down where they're defined. Also, depending on how the deployment is done, a change would just have to involve redeploying the config (assuming it's an ansible play)

ketiltrout commented 1 year ago

There are two constants defined in this file, PULL_TIMEOUT_BASE and PULL_BYTES_PER_SECOND which determine the length of time before the transfer is killed. I've indicated that these could be put in the alpenhorn config, but I'm not sure it's worth it. I'm looking for opinions.

Personally I lean towards having them in config, because if we ever end up wanting to change them for some reason it's more of a pain to hunt down where they're defined. Also, depending on how the deployment is done, a change would just have to involve redeploying the config (assuming it's an ansible play)

Done