Open sheikki opened 8 years ago
Hi, sheikki,
The columns definition is in the header of the output file:
ref_start ref_end ref_seqid ref_desc
Is "@650A9:00200:00424" a sequence of length 34? If so, this assignment might be your best bet, but it is too short to be reliable.
Benli Chai
RDP Staff
On Wed, May 11, 2016 at 5:48 AM, sheikki notifications@github.com wrote:
I'm classifying representative sequences of quality controlled and clustered 16S reads with command:
java -jar AlignmentTools.jar pairwise-knn query.fq db.fa
The db file is unaligned prokaryotic subset of RDP 11.4 clustered at 99% (with some sequence length thresholds).
Is this a sensible way to assign taxonomy to my representative sequences?
In output, I see lines like:
@650A9:00200:00424 1 + 155 1.000 0 34 34 0 83 S004055894 Listeria monocytogenes; CA5 Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Listeriaceae;family;Listeria;genus
As far as I can tell it's QID KNEIGHBOURS STRAND SCORE %ID QSTART QEND QEND QSTART SSTART SID. Is this the correct interpretation? Why is it that the QSTART and QEND values are displayed twice?
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/rdpstaff/AlignmentTools/issues/1
RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842
Thank you for the reply. Oddly, in my alignment file, ref_start value is always zero. A few examples:
@650A9:00007:00316 1 - 265 0.940 0 72 72 0 427 S001099040 Bacillus subtilis; XN-80-5 Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Bacillaceae 1;family;Bacillus;genus
>-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TGAGCAACATCTTGCACGGTACTGACT-ACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATAC----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>CACGTGGGTAACCTGCCTGTAAGACTGGGATAACTCCGGGAAACCGGGGCTAATACCGGATGGTTGTTTGAACCGCATGGTTCAGACATAAAAGGTGGCTTCGGCTACCACTTACAGATGGACCCGCGGCGCATTAGCTAGTTGGTGAGGTAACGGCTCACCAAGGCAACGATGCGTAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGTTTTCGGATCGTAAAGCTCTGTTGTTAGGTAAGAACAAGTGCCGTTCAAATA-GGGCGGCACCTTG-ACGGTAC---CTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCTCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTCCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCTCTGACAATCCTAAGAGATAGGACGTCCCCTTCGGGGCAAGGTGACAGGTGGTGGCATTAGGAAGACAAGTCGTTCAATAAGCGGCACTTGACGGTACTACCAGAAAGGCCACGCTAACTACGTGCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTGTCGGAATATTGGGCGTAAAGGGCTCGCAGGCGGTTTCTTAAGTCTGATGTGAAAGCCCCCGGCTCAACCGGGGAGGGTCATTGGAAACTGGGGAACTTGAGTGCAGAAGAGGAGAGTGGAATTTCACGTGTAGCGGTGAAATGCGTAGAGATGTGGAGGAACACCAGTGGCGAAGGCGACTCTCTGGTCTGTAACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGGGGGTTTCCGCCCCTTAGTGCTGCAGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGGTCGCAAGACTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCCTCTGACAATCCTAGAGATAGGACGTCCCCTTCGGGGGCAGAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCAGCATTCAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGGCAGAACAAAGGGCAGCGAAACCGCGAGGTTAAGCCAATCCCACAAATCTGTTCTCAGTTCGGATCGCAGTCTGCAACTCGACTGCGTGAAGCTGGAATCGTTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCG
@650A9:00009:00308 1 - 449 1.000 0 102 102 0 515 S003301453 Bacillus cereus; B16 Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Bacillaceae 1;family;Bacillus;genus
>---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ACTCTGGTTGTTAGGG-AGAACAAGTAGCTAG-T-AATAGCTGGCACCTTGACGGTACCTAA-CAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATAC-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>TATTTGGGCGGGGGGGGGCCTATCATGCAGTCGAGCGAATGGATTAAGAGCTTGCTCTTATGAAGTTATCGGCGGACGGGTGAGTAACACGTGGGTAACCTGCCCATAAGACTGGGATAACTCCGGGAAACCGGGGCTCTAATACCGGATAACATTTTGAACCGCATGGTTCGAAATTGAAAGGCGGCTTCGGCTGTCACTTATGGATGGACCCGCGTCGCATTAACTAGTTGGTGAGGTAACGGCTCACCAAGGCAACGATGCGTAGGCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGATGAAGGCTTTCGGGTCGTAAAACTCT-GTTGTTAGGGAAGAACAAGT-GCTAGTTGAATAGCTGGCACCTTGACGGTACCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGCGCGCAGGTGGTTTCTTAAGTCTGATGTGGAAAGCCCACGGCTCAACCGTGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAGGAAAGTGGAATTCCATGTGTAGCGGTGAAATGCGTAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACACTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGTGTTAGAGGGTTTCCGCCCTTTAGTGCTGAAGTTAACGCATTAAGCACTCCGCCTGGGGAGTACGGCCGCAAGGCTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTAATTCGAAGCAACGCGAAGAACCCTACCAGGTCTTGACATCCTCTGAAACCCTAGAGATAGGGCTTCTCCTTCGGGAGCAGAGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGTTAAGTCCGCAACGAGCGCAACCCTTGATCTTAGTTGCCATCATTAAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGGTACAAAGAGCTGCAAGACCGCGAGGTGGAGCTATTCTCATAAAACCGTTCTCAGTTCGGATTGTAGGCTGCAACTCGCCTACATGAAGCTGGAATCGCTAGTAATCGCGGATCAGGTTACCGCGGTGAATACGTTCCCGGGCCTTGTACACACCTCCCGTCACACCACGAGAGTTTGTAACACCCGAAGTCGGTGGGGTAACCTTTTGGGAGCCAGCCGGCCTAAAGGGGGAGAAAG
I'm classifying representative sequences of quality controlled and clustered 16S reads with command:
java -jar AlignmentTools.jar pairwise-knn query.fq db.fa
The db file is unaligned prokaryotic subset of RDP 11.4 clustered at 99% (with some sequence length thresholds).
Is this a sensible way to assign taxonomy to my representative sequences?
In output, I see lines like:
@650A9:00200:00424 1 + 155 1.000 0 34 34 0 83 S004055894 Listeria monocytogenes; CA5 Lineage=Root;rootrank;Bacteria;domain;Firmicutes;phylum;Bacilli;class;Bacillales;order;Listeriaceae;family;Listeria;genus
As far as I can tell it's QID KNEIGHBOURS STRAND SCORE %ID QSTART QEND QEND QSTART SSTART SID. Is this the correct interpretation? Why is it that the QSTART and QEND values are displayed twice?