hallamlab / MetaPathways

A modular pipeline for constructing Pathway/Genome Databases from environmental sequence information
http://hallam.microbiology.ubc.ca/MetaPathways
12 stars 7 forks source link

Double check the rRNA BLAST, rewriting over results due to SSU and LSU DB names #16

Closed nielshanson closed 11 years ago

nielshanson commented 11 years ago

The could be a couple of problems with the rRNA BLAST: When we added the LSU database to the blast, since both have the string "silva" in them they both write to the same .blastout file, this overwrites half of the results

Note how they both go to the same output file -out input1.rRNA.silva.blastout:

8. Scan for rRNA sequences in reference database - GREENGENES_gg16S-2012-11-06......... Success!

Issuing Command : /Users/nielsh/Desktop/test_metapathways/MetaPathways/executables/blastn -outfmt 6 -num_threads 16  -query output//input1/preprocessed//input1.fasta -out output//input1/blast_results//input1.rRNA.silva.blastout -db /Users/nielsh/Desktop/test_metapathways/MetaPathways//blastDB//SSURef_111_NR_tax_silva-2012-11-06 -max_target_seqs 5

                                                   SSURef_111_NR_tax_silva-2012-11-06......... Success!

Issuing Command : /Users/nielsh/Desktop/test_metapathways/MetaPathways/executables/blastn -outfmt 6 -num_threads 16  -query output//input1/preprocessed//input1.fasta -out output//input1/blast_results//input1.rRNA.silva.blastout -db /Users/nielsh/Desktop/test_metapathways/MetaPathways//blastDB//LSURef_111_tax_silva -max_target_seqs 5

                                                   LSURef_111_tax_silva......... Success!

The way we name this will have to be changed to distinguish LSU (23S/28S, LSU) and SSU (16S/18S, SSU) ribosomal RNA (rRNA). This will just take some intelligent use of regular expressions.

nielshanson commented 11 years ago

This has been resolved and will be updated in the next commit.