rdpstaff / classifier

RDP extensible sequence classifier for fungal lsu, bacterial and archaeal 16s
GNU General Public License v2.0
53 stars 32 forks source link

LOOT (by taxon) analysis: calculation error on % of misclassified Seqs? #21

Open dougwyu opened 6 years ago

dougwyu commented 6 years ago

I ran two LOOTs. When leaving one taxon out, the % misclassified in the last table (misclassified sequences group by taxon) is always 100%, which is not correct, according to the other tables in the output and according to the LOOT by sequence** analysis.

java -Xmx46g -jar classifier.jar loot -h -q MIDORI_UNIQUE_1.1_COI_RDP_.05_seqs.fasta -s MIDORI_UNIQUE_1.1_COI_RDP.fasta -t RDP_taxonomy_file.txt -o midori_leaveonetaxonout_test_0.05.txt

**misclassified sequences group by taxon    
Tested Seqs (non-singleton) misclassified pct misclassified
26881 26881 1
26881 26881 1
16 16 1
11 11 1
11 11 1
10 10 1
0 0 0

...

java -Xmx46g -jar classifier.jar loot -q MIDORI_UNIQUE_1.1_COI_RDP_.05_seqs.fasta -s MIDORI_UNIQUE_1.1_COI_RDP.fasta -t RDP_taxonomy_file.txt -o midori_leaveoneseqout_test_0.05.txt

**misclassified sequences group by taxon    
Tested Seqs (non-singleton) misclassified pct misclassified
26881 2363 0.087905956
26881 2363 0.087905956
16 3 0.1875
11 0 0
11 0 0
10 0 0

...