nhoffman / bioy

Tools for NGS sequence analysis and bacterial classification
GNU General Public License v3.0
0 stars 0 forks source link

Tool that builds multi-fasta/seqinfo/taxtable from query sequence and its blast results #21

Closed tyleraland closed 9 years ago

tyleraland commented 9 years ago

This tool's development is primarily motivated by the CAP requirement to validate our primer sequences'. This tool would also enable a future tool for building de novo trees from selected sequences.

Inputs: a single bait sequence (named, inside a fasta file).

Outputs:

What it does:

Questions:

Related software:

Future work (once this tool is built):

nhoffman commented 9 years ago

If you want to provide tax_ids at a uniform rank in the seq_info file, you will need to use taxtastic and include a taxonomy database as one of the inputs.

tyleraland commented 9 years ago

Re-envisioning the path forward.

Rather than re-implement a tool to perform/wrap BLAST, this proposed tool will use the output of (bioy) BLAST to download sequences. In particular, sseqids uniquely specify sequences to pull down from NCBI.

(Revised) Inputs: a file of unique identifiers (sseqid's / gi's), each corresponding to a sequence to download from NCBI.

It's conceivable that we might want to extend this feature to arbitrary (local/URI) sequence databases (that is, provide a sseqid and a database and download their sequences) but it's not trivial to reconstruct a sequence from, say, a BLAST database. What we could do is just search a fasta file for a gi and spit the sequence out. Perhaps we can add this functionality in the future, but in the mean time this can be done with seqmagick.

For the immediate future we should target NCB.

tyleraland commented 9 years ago

Needs satisfied by ...