Closed tyleraland closed 8 years ago
Cynomorium and Stigeoclonium are both genus level classifications. The taxonomy of the rest looks like this:
|---------+---------------------------------+---------+------+---------+--------+--------+--------+--------+-------+---------|
| tax_id | tax_name | rank | root | kingdom | phylum | class | order | family | genus | species |
|---------+---------------------------------+---------+------+---------+--------+--------+--------+--------+-------+---------|
| 337895 | Actinomadura hallensis | species | 1 | | 201174 | 1760 | 85012 | 2012 | 1988 | 337895 |
| 121292 | Arthrobacter sulfonivorans | species | 1 | | 201174 | 1760 | 85006 | 1268 | 1663 | 121292 |
| 43666 | Arthrobacter sulfureus | species | 1 | | 201174 | 1760 | 85006 | 1268 | 1663 | 43666 |
| 1396 | Bacillus cereus | species | 1 | | 1239 | 91061 | 1385 | 186817 | 1386 | 1396 |
| 1402 | Bacillus licheniformis | species | 1 | | 1239 | 91061 | 1385 | 186817 | 1386 | 1402 |
| 1408 | Bacillus pumilus | species | 1 | | 1239 | 91061 | 1385 | 186817 | 1386 | 1408 |
| 1423 | Bacillus subtilis | species | 1 | | 1239 | 91061 | 1385 | 186817 | 1386 | 1423 |
| 1107859 | Candidatus Halomonas phosphatis | species | 1 | | 1224 | 1236 | 135619 | 28256 | 2745 | 1107859 |
| 36740 | Dermabacter hominis | species | 1 | | 201174 | 1760 | 85006 | 85020 | 36739 | 36740 |
| 1352 | Enterococcus faecium | species | 1 | | 1239 | 91061 | 186826 | 81852 | 1350 | 1352 |
| 571 | Klebsiella oxytoca | species | 1 | | 1224 | 1236 | 91347 | 543 | 570 | 571 |
| 907 | Megasphaera elsdenii | species | 1 | | 1239 | 909932 | 909929 | 31977 | 906 | 907 |
| 39771 | Methylomonas aurantiaca | species | 1 | | 1224 | 1236 | 135618 | 403 | 416 | 39771 |
| 61654 | Methylopila capsulata | species | 1 | | 1224 | 28211 | 356 | 31993 | 61653 | 61654 |
| 61656 | Methylorhabdus multivorans | species | 1 | | 1224 | 28211 | 356 | 45401 | 61655 | 61656 |
| 470074 | Mycobacterium angelicum | species | 1 | | 201174 | 1760 | 85007 | 1762 | 1763 | 470074 |
| 382 | Sinorhizobium meliloti | species | 1 | | 1224 | 28211 | 356 | 82115 | 28105 | 382 |
| 216778 | Stenotrophomonas rhizophila | species | 1 | | 1224 | 1236 | 135614 | 32033 | 40323 | 216778 |
| 1311 | Streptococcus agalactiae | species | 1 | | 1239 | 91061 | 186826 | 1300 | 1301 | 1311 |
|---------+---------------------------------+---------+------+---------+--------+--------+--------+--------+-------+---------|
The phylum column contains 4 different tax_ids which means a default --max-group-size 3 bumps the last classification to "root". The other two genus assignments are awkwardly forced artifacts of the algo. If you play with the --max-group-size you would get a better result.
The overall low specificity assignment can be expected given the short read lengths, low hit coverage and low pident of the Blast hits:
|----------------------------------------------+------------+--------+--------+------+------+-------+
| qseqid | sseqid | pident | qstart | qend | qlen | qcovs |
|----------------------------------------------+------------+--------+--------+------+------+-------+
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000000777 | 94.55 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000009620 | 94.55 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000628083 | 94.55 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000628084 | 94.55 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S002165164 | 92.73 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000495026 | 92.73 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001294523 | 92.73 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000628080 | 92.73 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004226555 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000902853 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004226556 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000004557 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001242189 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000996434 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066145 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066147 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066146 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066149 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066148 | 91.82 | 1 | 110 | 250 | 44.00 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S002965784 | 90.57 | 5 | 110 | 250 | 42.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S002963222 | 90.62 | 15 | 110 | 250 | 38.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001795518 | 97.96 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S003611774 | 97.96 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S003289772 | 97.96 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000776263 | 97.96 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000012740 | 97.96 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000967118 | 97.96 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000399427 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000541623 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000541624 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000540588 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000540589 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000436251 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S002232091 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000351163 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004056772 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004056755 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S002916791 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S003286531 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066246 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066244 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004066245 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000623308 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000540590 | 95.92 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004223170 | 94.12 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000434759 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001155653 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001353096 | 93.88 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S003312934 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S002034309 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000444934 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001020487 | 95.65 | 2 | 47 | 250 | 18.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001020484 | 95.65 | 2 | 47 | 250 | 18.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S004230775 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000641690 | 95.65 | 2 | 47 | 250 | 18.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001353095 | 93.88 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001353094 | 93.88 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S003721966 | 91.84 | 2 | 50 | 250 | 19.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001589960 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001199566 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000926647 | 97.67 | 2 | 44 | 250 | 17.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000641717 | 97.56 | 10 | 50 | 250 | 16.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000925775 | 97.44 | 2 | 40 | 250 | 15.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000633544 | 97.44 | 2 | 40 | 250 | 15.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S002352232 | 97.37 | 2 | 39 | 250 | 15.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001020535 | 97.37 | 2 | 39 | 250 | 15.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000009616 | 97.37 | 2 | 39 | 250 | 15.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S003611785 | 97.37 | 2 | 39 | 250 | 15.20 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001093959 | 94.87 | 2 | 40 | 250 | 15.60 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000386911 | 97.22 | 2 | 37 | 250 | 14.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S000022501 | 97.22 | 2 | 37 | 250 | 14.40 |
| M03029:113:000000000-ALUWU:1:2102:18344:1629 | S001875319 | 94.74 | 2 | 39 | 250 | 15.20 |
|----------------------------------------------+------------+--------+--------+------+------+-------+
I have a sequence that returns many blast hits between 97-98 pident. Some details references are used in the final assignment while the rest are condensed to "root". I'm not sure if this is the expected behavior or not, but this looks odd and I wanted to reproduce it.
database: RDP 11_4 hits (one qseqid): /mnt/disk2/molmicro/working/tland9/2016-02-02_capture/1629-blast-hits seq_info: /molmicro/common/rdp/11_4/rdp/11_4.0/tax_filter/filtered/seq_info.csv taxonomy: /molmicro/common/rdp/11_4/rdp/11_4.0/tax_filter/filtered/taxonomy.csv
Classifier command:
bioy classifier 1629-blast-hits /molmicro/common/rdp/11_4/rdp/11_4.0/tax_filter/filtered/seq_info.csv <(csvcut -c tax_id,tax_name,rank,root,kingdom,phylum,class,order,family,genus,species /molmicro/common/rdp/11_4/rdp/11_4.0/tax_filter/filtered/taxonomy.csv) --has-header --specimen 168_35 --out classify_out --details-out details_out
Classification output:
Details output (notice all of the condensed_id == 1):