peterk87 / nf-villumina

Generic viral Illumina sequence analysis pipeline
MIT License
4 stars 5 forks source link

Add process for BLAST against user specified DB and BLAST report generation #9

Open peterk87 opened 4 years ago

peterk87 commented 4 years ago

BLAST+ version>=2.8.1 "supports the new BLAST database version (BLASTDBv5). This is a taxonomically aware version of the BLAST database. See notes at https://ftp.ncbi.nlm.nih.gov/blast/db/v5/blastdbv5.pdf"

Given BLAST results with taxonomic information, it would be possible to sort contigs by top taxid and report this information in a tabular report (and a web BLAST type of visual).

TODO

MatFish commented 4 years ago

This would be really helpful for running analysis on metagenomic samples where a lot of contigs are generated. Since we are looking at viruses with this, perhaps it could be done against a database of just viral genomes. Maybe something similar to this:

https://msphere.asm.org/content/msph/3/2/e00069-18.full.pdf

peterk87 commented 4 years ago

That looks like a great resource!

The CD-HIT-EST clustered DB would be particularly useful for our purposes and it seems to be kept fairly up to date (Current Release v16.0 (May 29, 2019)) and not too large (1.6GB uncompressed FASTA).