kblin / ncbi-genome-download

Scripts to download genomes from the NCBI FTP servers
Apache License 2.0
924 stars 176 forks source link

Option for metadata file only #202

Open Ahmed-Shibl opened 1 year ago

Ahmed-Shibl commented 1 year ago

First I want to thank you for this great tool!

I wanted to ask if there was an option to get the metadata file for a list of accessions without downloading the genomes. I would like to refine the number of genomes that I download based on their metadata.

I tried this command in an attempt to generate the ALL_actino_accs_in_tree_metadata.txt file: ncbi-genome-download --dry-run --section genbank --assembly-level all --assembly-accessions ALL_actino_accs_in_tree.txt --output-folder /Obesity_v2/actinobacteriota/anvio --metadata ALL_actino_accs_in_tree_metadata.txt --progress-bar --verbose bacteria

No file was generated but this was the output:

GCA_007954505.1 Microbacterium sp. CBA3102         CBA3102
GCA_008122505.1 Agromyces mariniharenae                 NEAU-184
GCA_008123405.1 Nocardioides sp. BGMRC 2183         BGMRC 2183
GCA_009696325.1 Collinsella sp. WCA1-178-WT-3 (M1)  WCA1-178-WT-3 (M1)
GCA_009696315.1 Cutibacterium porci                         WCA-380-WT-3A

Please let me know if you would need any other information. Thanks!

Thomieh73 commented 2 months ago

Hi, maybe you figured this out. ncbi-genome-download creates a cache file, which contains the metadata you need. If you run:

ncbi-genome-download -h

it will then produce these lines in the helpfile:

 -N, --no-cache        Don't cache the assembly summary file in
                        /cluster/home/thhaverk/.cache/ncbi-genome-download.   

So if you do a dry run, you will have the metadata file in the cache folder