nhoffman / bioy

Tools for NGS sequence analysis and bacterial classification
GNU General Public License v3.0
0 stars 0 forks source link

021 ncbi fetch #23

Closed tyleraland closed 9 years ago

tyleraland commented 9 years ago

blast: support for remote (ncbi) blast

ncbi_fetch (new): Given list of seqids (gi or accession numbers) fetch their corresponding sequences from ncbi

tyleraland commented 9 years ago

I made "email" a required argument. Biopython produces some warning messages (stderr) if it isn't explicitly provided and I'm not sure how to catch it.

tyleraland commented 9 years ago

Here is the header and an example line from seqinfo (comma-separated values manually split into separate lines). I'm open to suggestions.

seqid,gi,gb,taxid,fullname

gi|310601|gb|L19300.1|, 310601, L19300.1, 1280, "Staphylococcus aureus DNA sequence encoding three ORFs, complete cds; prophage phi-11 sequence homology, 5' flank"

nhoffman commented 9 years ago

The ">" is markup in the fasta file, and should not be part of the sequence name. Put another way, the first column of the seq_info file must be identical to the value of seq.id, where 'seq' is the corresponding sequence object.