Closed taltman closed 7 years ago
I found the same problem, taltman. I ran Metapathways 2 using the RefSeq protein database. The only Eukaryotic taxonomic classifications were: cellular organisms (131567), Euryarchaeota (28890), Marine Group II euryarchaeote SCGC AB-629-J06 (1131268) and unclassified Euryarchaeota (33867)
I am working with seastars, so I am able to partially get around this by downloading the sea star genomes from echinobase and adding them to the rRNA databases. Any hits to one of these genomes indicates that I likely have a sea start DNA sequence and not some contaminating marine microorganisms.
(The database was downloaded from ftp://ftp.ncbi.nlm.nih.gov/refseq/release/complete/complete.nonredundant_protein.N.protein.faa.gz, where N=1…309; and files were unzipped and concatenated into a single master file used by Metapathways)
yes, right now MP supports only prokaryotic data
Has this been implemented?
No, this requires several new modules (like alternative orf prediction) and some tools that currently do not exist (such as a method by which to split euks and prots at the nucleotide level reliablly). MP is a eukaryotic pipeline as it stands. Perhaps a phase 2 project?
On Jan 10, 2017 3:53 PM, "Tomer Altman" notifications@github.com wrote:
Has this been implemented?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hallamlab/metapathways2/issues/70#issuecomment-271735811, or mute the thread https://github.com/notifications/unsubscribe-auth/AI6RqGVN153UdtEbKnvhiP2g5YEjyrURks5rRBnjgaJpZM4EBot_ .
Using Kraken, I identified several contigs in my sample that were of eukaryotic origin. Yet in the protein taxonomic annotation, no eukaryotic genes were identified. I suspect that prodigal is only being run with the common bacterial translation table, and thus it might be missing eukaryotic genes. Is this the case?