rdpstaff / classifier

RDP extensible sequence classifier for fungal lsu, bacterial and archaeal 16s
GNU General Public License v2.0
53 stars 32 forks source link

Training the RDP classifier -c option #23

Open ddavis3739 opened 5 years ago

ddavis3739 commented 5 years ago

RDPstaff,

I am trying to retrain the RDP classifier and have an issue with the -c option. I have already prepped my seq and tax files (end of email) and trained RDP against them.

It output 4 files (below), but none of them is the properties file.

bergeyTrainingTree.xml logWordPrior.txt genus_wordConditionalProbList.txt wordConditionalProbIndexArr.txt

Do I need to include the -c file to get this? If so, there is no information anywhere on how to generate it that I can find so I was hoping can help. According to the README, "It should at least three columns: name, rank and mean for the lowest rank taxon to be trained". What do you mean by mean in the context of this file? Furthermore, how should I go about generating the whole file?

SEQ FILE

AB353770|AB353770.1.1740_U Root;Eukaryota;Alveolata;Dinoflagellata;Dinophyceae;Peridiniales;Kryptoperidiniaceae;Unruhdinium ATGCTTGTCTCAAAGATTAAGCCATGCATGTCTCAGTATAAGCTTTTACATGGCGAAACTGCGAATGGCTCATTAAAACAGTTACAGTTTATTTGAAG (cont.)

TAX FILE

0*Root*-1*0*rootrank 1*Eukaryota*0*1*domain 2*Alveolata*1*2*supergroup 3*Dinoflagellata*2*3*division 4*Dinophyceae*3*4*class 5*Peridiniales*4*5*order

Thanks for the help

-Andrew Davis