scarlehoff / pyHepGrid

Tool for distributed computing management geared towards HEP applications.
GNU General Public License v3.0
6 stars 4 forks source link

Copy protocols + Wait time #47

Closed JBlack93 closed 4 years ago

JBlack93 commented 4 years ago

Firstly: Removal of protocols that do not work. (Suggestion from Adam and Paul):

Paul and Adam ask that xrootd is removed from the list of protocols (since it will always fail/is not installed), and that only one of root or xroot is tried (since they are the same backend),

Secondly: Dramatically increase wait time upon failed copy to ensure that the repeated copy attempts do not hamper the server with an increased load (also randomise this wait time, to ensure that we avoid distributing a synchronised set of jobs which failed on the same copy command) See issue #44 .

Waiting ~half a minute for a job that takes 4+ hours shouldn't be a problem.

marianheil commented 4 years ago

Is there a reason to implement this? @jcwhitehead already added a (incremental) sleep time in #43, together with other changed to the copy. The only thing missing is removing root and xrootd from the protocol list.

JBlack93 commented 4 years ago

I would argue that this is a quick and dirty fix to the problem presented in #44.

Not ideal, and to be superseded by #43.

Given the nature of the issue I think a quick patch is worth implementing while the new copy is validated.

marianheil commented 4 years ago

43 is basically done it just needs some testing, which I'm doing at the moment

marianheil commented 4 years ago

Redundant after #43