hbckleikamp / GTDB2DIAMOND

Set of simple auxiliary python scripts to help create GTDB databases for annotation with DIAMOND
MIT License
5 stars 1 forks source link

GTDB2DIAMOND

Set of simple auxiliary python scripts to help create GTDB databases for annotation with DIAMOND


This collection of scripts is designed to facillitate DIAMOND annotations with GTDB representative protein sequences. It consists of 4 separate small scripts, that can be run in Spyder, or reused for other purposes. Since DIAMOND varies in dependancies and requirements on different operating systems, automated diamond installation is not included, and should be done following: https://github.com/bbuchfink/diamond/wiki

Running a pipeline would consist of:
-1. GTDB_protein_download.py: to download recent protein fasta files and taxonomy metadata
-2. GTDB_protein_rename.py: to include organism accession into headers of GTDB protein files
-3. GTDB_protein_merge.py: merge renamed GTDB files into a single database
-4. Construction of DIAMOND database from output of 3. (diamond --makedb, see: https://github.com/bbuchfink/diamond/wiki)
-5. Annotation of proteins with DIAMOND database constructed in 4. (diamond --blastp, see: https://github.com/bbuchfink/diamond/wiki)
-6. GTDB_LCA.py: annotate taxonomy of query sequences based on lowest common ancestor, with top bitscore cutoff.

Licensing

The pipeline is licensed with standard MIT-license.
If you would like to use this pipeline in your research, please cite the following papers:

-Buchfink B, Reuter K, Drost HG, "Sensitive protein alignments at tree-of-life scale using DIAMOND", Nature Methods 18, 366–368 (2021). doi:10.1038/s41592-021-01101-x
-Parks, D.H., et al. 2020. A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology, https://doi.org/10.1038/s41587-020-0501-8.
-Kleikamp, Hugo BC, et al. "Comparative metaproteomics demonstrates different views on the complex granular sludge microbiome." bioRxiv (2022).

Contact:

-Hugo Kleimamp (Developer): hugo.kleikamp@uantwerpen.be
-Martin Pabst: M.Pabst@tudelft.nl

Recommended links to other repositories:

https://github.com/bbuchfink/diamond