nils-braun / b2luigi

Task scheduling and batch running for basf2 jobs made simple
GNU General Public License v3.0
17 stars 11 forks source link

Persistent partial download directories for gbasf2 dataset downloads #67

Closed meliache closed 3 years ago

meliache commented 3 years ago

For gbasf2 tasks, the output if usually a directory which contains dataset resulting from the gbasf2 project on the grid. It's really important to make sure that this directory contains a complete dataset identical to the one on the grid. Therefore I used a download into a temporary directory and moved the dataset to the final directory if it was complete. However, the temporary directories provided by the tempfile package are transient and disappear once the process exits, in our case if the download failed. At first I thought that having atomic downloads is nice, however the downloads with gbasf2 fail often and take long and are a bottle-neck timewise.

Therefore, in this PR, I change the code to initially download the grid datasets into a normal directory with a .partial ending that persists if the download fails and only gets removed once the download was succesful and the files have been moved to their final output location. I falso considered using a directory in ~/.cache, but decided to have the .partial dir in the result_dir next to the final output dir, because this is where I think that the user has enough storage for storing the output root files.

Also while doing this fix I refactored the download code into more separate functions and fixed a pretty nasty bug in 9b439d9ea94abbc9b5e41109b9b3a54ab1718015 which prevents other outputs to be downloaded if one output is found to already exist.

This PR is in preperation for a fix for #61, but working on that fix is much more comfortable with persistent downloads and a refactored output function. The branch for fixing that issue will start from the head of this PR.

So far I tested this only with my simple example project.

People who might be interested and might have a look if they have fun: @bilokin @welschma