Closed jotech closed 1 year ago
the tar.gz is only an intermediate file.
In theory, once the tar is downloaded, a rule extract_gtdb
should extract it and create a flag os.path.join(GTDBTK_DATA_PATH, "downloaded_success") from there on no more data should be downloaded...
It is quite probable that the download was halve way done and created an incomplete tar.gz. Don't you think? In this cases I prefer to remove and restart the download.
But thank you for suggesting improvements..
The rule
localrule download_gtdb
is executed when usingatas download
even though the filegtdb_data.tar.gz
already exists in the folder specified by--db-dir
. This leads to unnecessary traffic and runtime because the gtdb is quite large.Background: I removed the conda environments manually for debugging reasons and found that the download started again, although all necessary files (
atlas/GTDB_V08_R214/gtdb_data.tar.gz
) were available.Describe the solution you'd like The rule
download_gtdb
should check whether the file is available and use the existing download whenever possible.Additional context