motu-tool / motus_v3_genomes

Small tool to download all genomes ued for mOTUs 3
0 stars 0 forks source link

Download mOTUs 3 genomes

This tool allows to download all or any of the 700,000 genomes used for the mOTUs 3 database. The user can download a specific genome, all genomes associated with a specific mOTU, or the complete database.

Installation

To run the script, clone this repository

git clone https://github.com/motu-tool/motus_v3_genomes
cd motus_v3_genomes

Downloading genomes

The data will be downloaded into the same folder where the script is located, under a folder with the name motus_v3_genomes.

The structure is as follows:

Note that all files are first downloaded to motus_v3_genomes/temp_dir and moved to the final destination once the download and unzip are complete.

The script automatically checks the md5 sum of each of the downloaded files.

To download one genome (for example the MAG LIAN20-1_SAMN11649416_METAG_000035):

python motus_genomes_download -m LIAN20-1_SAMN11649416_METAG_000035

To download all genomes from one mOTU (for example the ref mOTU ref_mOTU_v3_00006):

python motus_genomes_download -m ref_mOTU_v3_00006

If you run the two commands above to download a genome and mOTU, you will end up with the following structure:

.
|-- README.md
|-- motus_genomes_download
`-- motus_v3_genomes
    |-- ext_mOTU_v3_19213
    |   |-- 1.list_files.txt
    |   |-- LIAN20-1_SAMN11649406_METAG_001584.fa
    |   `-- LIAN20-1_SAMN11649416_METAG_000035.fa
    |-- ref_mOTU_v3_00006
    |   |-- 1.list_files.txt
    |   |-- 1218103.SAMD00041821.fa
    |   |-- 1340435.SAMN02440767.fa
    |   `-- 253.SAMN05421692.fa
    `-- temp_dir

To download all genomes:

python motus_genomes_download -m all

Finally, you can list all mOTUs and the number of associated genomes (available for download) with:

python motus_genomes_download -l

And you can list all genomes with:

python motus_genomes_download -g