rdpstaff / classifier

RDP extensible sequence classifier for fungal lsu, bacterial and archaeal 16s
GNU General Public License v2.0
56 stars 32 forks source link

How to format the taxonomy file to retrain classifier #18

Open yingeddi2008 opened 7 years ago

yingeddi2008 commented 7 years ago

Hi rdp staff,

I am trying to retrain RDP classifier using NCBI 16s database, however, when I looked into the example taxonomy file and the fasta file, I am a bit confused how should I even generate that file.

0*Root*-1*0*rootrank
1*Bacteria*0*1*domain
2*"Actinobacteria"*1*2*phylum
3*Actinobacteria*2*3*class
4*Acidimicrobidae*3*4*subclass
5*Acidimicrobiales*4*5*order
6*"Acidimicrobineae"*5*6*suborder
7*Acidimicrobiaceae*6*7*family
8*Acidimicrobium*7*8*genus
9*Ferrimicrobium*7*8*genus
10*Ferrithrix*7*8*genus
11*Ilumatobacter*7*8*genus
12*Iamiaceae*6*7*family
3102*Aquihabitans*12*8*genus
13*Iamia*12*8*genus

Could you please explain how each line is constructed? Allow me to take a line as an example,

6*"Acidimicrobineae"*5*6*suborder

I could guess that the first number is the taxonomy id for Acidimicrobineae, which is 6, and its parent taxonomy is 5, Acidimicrobiales. I assume that the suborder at the end of the line indicates that Acidimicrobineae is at the taxonomy rank of suborder, right? Then what is the 6 before suborder mean? when I look at 12*Iamiaceae*6*7*family, I can say Iamiaceae is a family level taxonomy, which has the parent of 6 (Acidimicrobineae) and 7 (Acidimicrobiaceae)? I am not sure I am getting what's the rule of constructing the taxonomy file here. Could you please explain how this is done?

Thanks in advance,

Eddi

rdpstaffmsu commented 7 years ago

Hi, Eddi,

I have example files and the following instructions on the procedure for preparing training files for RDP Classifier:

Prepare for training RDP Classifier

Files needed and format specifications and requirements:

  1. Compile a sequence file (eg. rawSeq.fasta)

    Format: FASTA with a unique identifier for each sequence.

    Each sequence carries a unique identifier, a string of characters that does include any whitespace characters.

  2. Compile a taxonomy file (eg. rawTaxonomy.txt)

    Format: tab-delimited text file (.txt)

    Header: First column: sequence identifier; the following columns contain taxonomic rank names one in a column in the order from root (highest) to leaf rank (lowest), such as Domain/Kingdom, Phylum, Class, Order, Family, Genus, etc.) for each taxon level you want to represent.

    Data rows: one row per training sequence with following info:

           Column 1: sequence classifier (this should be identical to

    that in the sequence file

           Column 2-N: taxon names corresponding to the rank names in

    the header.

           Fill in a '-' character for any rank column not applicable

    to the lineage of this sequence.

Warning: make sure that the taxon names are unique between different lineages. The following ‘convergent’ evolution is not allowed:

SeqID

rootRank

Domain

Phylum

Class

Order

Family

Genus

SeqID-0001

root

Bacteria

Firmicutes

Clostridia

Clostridiales

Clostridiaceae

Clostridium

SeqID-0002

root

Bacteria

Firmicutes

Clostridia

Clostridiales

Eubacteriaceae

Clostridium

  1. Run command: lineage2taxTrain.py rawTaxonomy.txt > ready4train_taxonomy.txt

  2. Run command: addFullLineage.py ready4train_taxonomy.txt rawSeq.fasta > ready4train_seqs.fasta

  3. Use the taxonomy file (eg. ready4train_taxonomy.txt) and sequence file (e.g. ready4train_seqs.fasta) to train RDP Classifier.

Let me if you have questions.

Benli Chai RDP Staff

On Wed, Nov 30, 2016 at 1:41 PM, yingeddi2008 notifications@github.com wrote:

Hi rdp staff,

I am trying to retrain RDP classifier using NCBI 16s database, however, when I looked into the example taxonomy file and the fasta file, I am a bit confused how should I even generate that file.

0Root-10rootrank 1Bacteria01domain 2"Actinobacteria"12phylum 3Actinobacteria23class 4Acidimicrobidae34subclass 5Acidimicrobiales45order 6"Acidimicrobineae"56suborder 7Acidimicrobiaceae67family 8Acidimicrobium78genus 9Ferrimicrobium78genus 10Ferrithrix78genus 11Ilumatobacter78genus 12Iamiaceae67family 3102Aquihabitans128genus 13Iamia128genus

Could you please explain how each line is constructed? Allow me to take a line as an example,

6"Acidimicrobineae"56suborder

I could guess that the first number is the taxonomy id for Acidimicrobineae, which is 6, and its parent taxonomy is 5, Acidimicrobiales. I assume that the suborder at the end of the line indicates that Acidimicrobineae is at the taxonomy rank of suborder, right? Then what is the 6 before suborder mean? when I look at 12Iamiaceae67family, I can say Iamiaceae is a family level taxonomy, which has the parent of 6 (Acidimicrobineae) and 7 (Acidimicrobiaceae)?

Thanks in advance,

Eddi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AKlEVjUhOLEdgD6SbnykTyuu1Y5m8eSaks5rDcN2gaJpZM4LAiX- .

-- RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842

Seq_ID Kingdom Phylum Class Order Family Genus Species SH213958.07FU_AF444533_refs Fungi Basidiomycota Microbotryomycetes Sporidiobolales Sporidiobolales_Incertae_sedis Rhodotorula Rhodotorula_diffluens SH213959.07FU_KJ706646_reps Fungi Basidiomycota Microbotryomycetes Sporidiobolales Sporidiobolales_Incertae_sedis Rhodotorula Rhodotorula_sp_1 SH191122.07FU_JN206370_reps Fungi Zygomycota Incertae_sedis Mucorales Mucorales_Incertae_sedis Syzygites Syzygites_megalocarpus SH177358.07FU_Z81447_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Valdensinia Valdensinia_heterodoxa SH177366.07FU_Z80894_reps Fungi Ascomycota Leotiomycetes Helotiales Rutstroemiaceae Rutstroemia Rutstroemia_bolaris SH177367.07FU_AY546074_reps Fungi Ascomycota Leotiomycetes Rhytismatales Rhytismataceae Lophodermium Lophodermium_conigenum SH177368.07FU_AB693917_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Monilinia Monilinia_sp_1 SH177370.07FU_AB026166_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Ciborinia Ciborinia_allii SH177371.07FU_Z73794_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Monilinia Monilinia_urnula SH177372.07FU_AY645900_reps Fungi Ascomycota Leotiomycetes Helotiales Hemiphacidiaceae Sarcotrochila Sarcotrochila_macrospora SH213382.07FU_JN979417_refs Fungi Ascomycota Sordariomycetes Xylariales Xylariaceae Hypoxylon Hypoxylon_fendleri SH213386.07FU_KM052716_refs Fungi Ascomycota Sordariomycetes Xylariales Xylariaceae Hypoxylon Hypoxylon_sp_1 SH194557.07FU_DQ008233_reps Fungi Ascomycota Leotiomycetes Helotiales Dermateaceae Mollisia Mollisia_sp_1 SH189856.07FU_JQ409283_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_1 SH189859.07FU_JX434665_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_2 SH189860.07FU_HE687084_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_3 SH189861.07FU_GQ985429_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_4 SH189862.07FU_AY969513_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_5 SH189857.07FU_JN102365_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae Peziza Peziza_sp_1 SH189858.07FU_EU554730_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_6 SH189863.07FU_KJ591045_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_7 SH174118.07FU_JQ081850_reps Fungi Ascomycota Sordariomycetes Sordariales Sordariales_unidentified Sordariales_unidentified Sordariales_unidentified_sp_1 SH189872.07FU_EU014071_reps Fungi Basidiomycota Pucciniomycetes Pucciniales Uropyxidaceae Tranzschelia Tranzschelia_discolor SH206047.07FU_AY559338_reps Fungi Ascomycota Dothideomycetes Capnodiales Capnodiales_unidentified Capnodiales_unidentified Capnodiales_unidentified_sp_1 SH206048.07FU_KF309965_reps Fungi Ascomycota Dothideomycetes Capnodiales Capnodiales_Incertae_sedis Monticola Monticola_elongata SH206049.07FU_AY843042_reps Fungi Ascomycota Dothideomycetes Dothideomycetes_unidentified Dothideomycetes_unidentified Dothideomycetes_unidentified Dothideomycetes_unidentified_sp_1 SH206053.07FU_JN942642_reps Fungi Ascomycota Saccharomycetes Saccharomycetales Saccharomycetales_Incertae_sedis Candida Candida_glabrata SH194562.07FU_AB498974_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_nevoi SH206057.07FU_JN709043_reps Fungi Ascomycota Dothideomycetes Capnodiales Teratosphaeriaceae Teratosphaeria Teratosphaeria_sp_1 SH206058.07FU_GU721292_reps Fungi - - - - - Fungi_unidentified_sp_1 SH194564.07FU_GU356546_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_kerribeeensis SH194565.07FU_AB329681_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_galii SH194563.07FU_AB498962_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_hiratae SH194567.07FU_AB329684_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Striatoidium Striatoidium_baccharidis SH174173.07FU_FJ362291_reps Fungi Basidiomycota Agaricomycetes Boletales Boletaceae Boletus Boletus_bicolor SH174125.07FU_FN555109_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_1 SH174126.07FU_FJ541434_reps Fungi Ascomycota - - - - Ascomycota_unidentified_sp_1 SH174127.07FU_JF414846_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Gaeumannomyces Gaeumannomyces_incrustans SH174128.07FU_KJ855489_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Gaeumannomyces Gaeumannomyces_sp_1 SH174133.07FU_DQ528792_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Nakataea Nakataea_oryzae SH174134.07FU_FJ430720_reps Fungi Ascomycota Sordariomycetes - - - Sordariomycetes_unidentified_sp_1 SH174135.07FU_KJ855505_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_2 SH174147.07FU_AB274433_reps Fungi Ascomycota Sordariomycetes Magnaporthales Pyriculariaceae Proxipyricularia Proxipyricularia_zingiberis SH174148.07FU_AB512785_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Pyricularia Pyricularia_sp_1 SH174137.07FU_AJ132542_reps Fungi Ascomycota Eurotiomycetes Chaetothyriales Herpotrichiellaceae Phialophora Phialophora_sp_1 SH174138.07FU_KJ855487_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_3 SH174140.07FU_AB818016_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Pyricularia Pyricularia_sp_2 SH174141.07FU_EU636699_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Harpophora Harpophora_oryzae SH174142.07FU_JX134600_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Magnaporthiopsis Magnaporthiopsis_poae SH174143.07FU_KJ855497_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_4 SH174145.07FU_EU144817_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_5 SH174146.07FU_KC354577_reps Fungi Ascomycota Sordariomycetes - - - Sordariomycetes_unidentified_sp_2 SH199540.07FU_KC414241_reps Fungi Basidiomycota Agaricomycetes Gloeophyllales Gloeophyllaceae Veluticeps Veluticeps_ambigua SH199543.07FU_UDB016415_refs Fungi Basidiomycota Agaricomycetes Polyporales Fomitopsidaceae Postia Postia_undosa SH177464.07FU_GU055939_reps Fungi Ascomycota Eurotiomycetes Chaetothyriales - - Chaetothyriales_unidentified_sp_1 SH206064.07FU_HQ022506_reps Fungi Ascomycota Sordariomycetes Hypocreales Bionectriaceae Clonostachys Clonostachys_sp_1 SH206065.07FU_AY425633_reps Fungi Ascomycota Lecanoromycetes Lecanorales Psoraceae Psora Psora_decipiens SH206066.07FU_KF823600_reps Fungi Ascomycota Sordariomycetes - - - Sordariomycetes_unidentified_sp_3 SH199552.07FU_KF274644_refs Fungi Basidiomycota Agaricomycetes Polyporales Fomitopsidaceae Fomitella Fomitella_supina SH174175.07FU_JF449882_reps Fungi Ascomycota Leotiomycetes Helotiales - - Helotiales_unidentified_sp_1 SH206068.07FU_GU054276_reps Fungi Ascomycota - - - - Ascomycota_unidentified_sp_2 SH174177.07FU_JX192683_reps Fungi Ascomycota Sordariomycetes Hypocreales Cordycipitaceae - Cordycipitaceae_unidentified_sp_1 SH199558.07FU_HE963782_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Brevicellicium Brevicellicium_olivascens SH199559.07FU_HE963789_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Brevicellicium Brevicellicium_olivascens SH206069.07FU_JN811088_reps Fungi - - - - - Fungi_unidentified_sp_2 SH174183.07FU_GQ927301_reps Fungi Ascomycota Lecanoromycetes Peltigerales Pannariaceae Psoroma Psoroma_fruticulosum SH174184.07FU_GQ927299_reps Fungi Ascomycota Lecanoromycetes Peltigerales Pannariaceae Psoroma Psoroma_buchananii SH174185.07FU_GQ927305_reps Fungi Ascomycota Lecanoromycetes Peltigerales Pannariaceae Psoroma Psoroma_hypnorum_var._paleaceum SH223384.07FU_JN206297_reps Fungi Zygomycota Incertae_sedis Mucorales Phycomycetaceae Spinellus Spinellus_fusiger SH206073.07FU_JX310406_reps Fungi Basidiomycota Agaricomycetes Gomphales Gomphaceae Ramaria Ramaria_rubribrunnescens SH174196.07FU_UDB015353_refs Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_subnudipes SH174240.07FU_EF434113_reps Fungi Basidiomycota Agaricomycetes Auriculariales - - Auriculariales_unidentified_sp_1 SH174205.07FU_AM882801_refs Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_leptocystis SH206074.07FU_EU669323_reps Fungi Basidiomycota Agaricomycetes Gomphales Gomphaceae Ramaria Ramaria_maculatipes SH174194.07FU_UDB004943_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae - Inocybaceae_unidentified_sp_1 SH174195.07FU_HE687059_reps Fungi Basidiomycota Agaricomycetes Agaricales - - Agaricales_unidentified_sp_1 SH174198.07FU_HF565068_reps Fungi Basidiomycota Agaricomycetes Agaricales - - Agaricales_unidentified_sp_2 SH174199.07FU_KJ432291_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_lanatopurpurea SH174200.07FU_JQ975963_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae - Inocybaceae_unidentified_sp_2 SH174202.07FU_JF908177_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_sp_1 SH174203.07FU_JF908158_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_leptocystis SH174204.07FU_FR852254_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_sp_2 SH174242.07FU_KF359560_reps Fungi Ascomycota Sordariomycetes Hypocreales Nectriaceae Fusidium Fusidium_sp_1 SH174229.07FU_UDB015045_reps Fungi Ascomycota Lecanoromycetes Ostropales Odontotremataceae Geltingia Geltingia_associata SH199562.07FU_JF300723_refs Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_1 SH199561.07FU_AY969490_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_2 SH206078.07FU_KJ021221_reps Fungi Ascomycota Lecanoromycetes Teloschistales Teloschistaceae Eilifdahlia Eilifdahlia_dahlii SH174235.07FU_JX448358_reps Fungi Ascomycota Dothideomycetes Pleosporales - - Pleosporales_unidentified SH199563.07FU_KF718212_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_3 SH199564.07FU_HM030587_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_4 SH199565.07FU_JF519114_refs Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_5 SH206080.07FU_KC478560_reps Fungi Basidiomycota Agaricomycetes - - - Agaricomycetes_unidentified_sp_1 SH177453.07FU_JN020964_reps Fungi Basidiomycota Agaricomycetes Agaricales Strophariaceae Agrocybe Agrocybe_erebia SH174249.07FU_AF011289_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Cystotheca Cystotheca_lanestris SH174251.07FU_AB743781_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Setoidium Setoidium_castanopsidis SH177458.07FU_AF444599_refs Fungi Basidiomycota Agaricostilbomycetes Agaricostilbales Chionosphaeraceae Chionosphaera Chionosphaera_apobasidialis SH212541.07FU_KC884399_reps Fungi - - - - - Fungi_unidentified_sp_3 SH177459.07FU_UDB015324_reps Fungi Basidiomycota Agaricomycetes Agaricales Strophariaceae Pholiota Pholiota_tuberculosa SH177460.07FU_FJ596817_reps Fungi Basidiomycota Agaricomycetes Agaricales Strophariaceae Pholiota Pholiota_sp_1 SH177463.07FU_EU139156_reps Fungi Ascomycota Eurotiomycetes Chaetothyriales Herpotrichiellaceae Capronia Capronia_sp_1

0Root-10rootrank 1Fungi01Kingdom 2Basidiomycota12Phylum 3Microbotryomycetes23Class 4Sporidiobolales34Order 5Sporidiobolales_Incertae_sedis45Family 6Rhodotorula56Genus 7Rhodotorula_diffluens67Species 8Rhodotorula_sp_167Species 9Zygomycota12Phylum 10Incertae_sedis93Class 11Mucorales104Order 12Mucorales_Incertae_sedis115Family 13Syzygites126Genus 14Syzygites_megalocarpus137Species 15Ascomycota12Phylum 16Leotiomycetes153Class 17Helotiales164Order 18Sclerotiniaceae175Family 19Valdensinia186Genus 20Valdensinia_heterodoxa197Species 21Rutstroemiaceae175Family 22Rutstroemia216Genus 23Rutstroemia_bolaris227Species 24Rhytismatales164Order 25Rhytismataceae245Family 26Lophodermium256Genus 27Lophodermium_conigenum267Species 28Monilinia186Genus 29Monilinia_sp_1287Species 30Ciborinia186Genus 31Ciborinia_allii307Species 32Monilinia_urnula287Species 33Hemiphacidiaceae175Family 34Sarcotrochila336Genus 35Sarcotrochila_macrospora347Species 36Sordariomycetes153Class 37Xylariales364Order 38Xylariaceae375Family 39Hypoxylon386Genus 40Hypoxylon_fendleri397Species 41Hypoxylon_sp_1397Species 42Dermateaceae175Family 43Mollisia426Genus 44Mollisia_sp_1437Species 45Pezizomycetes153Class 46Pezizales454Order 47Pezizaceae465Family 48Pezizaceae_unidentified_sp_1476Species 49Pezizaceae_unidentified_sp_2476Species 50Pezizaceae_unidentified_sp_3476Species 51Pezizaceae_unidentified_sp_4476Species 52Pezizaceae_unidentified_sp_5476Species 53Peziza476Genus 54Peziza_sp_1537Species 55Pezizaceae_unidentified_sp_6476Species 56Pezizaceae_unidentified_sp_7476Species 57Sordariales364Order 58Sordariales_unidentified575Family 59Sordariales_unidentified586Genus 60Sordariales_unidentified_sp_1597Species 61Pucciniomycetes23Class 62Pucciniales614Order 63Uropyxidaceae625Family 64Tranzschelia636Genus 65Tranzschelia_discolor647Species 66Dothideomycetes153Class 67Capnodiales664Order 68Capnodiales_unidentified675Family 69Capnodiales_unidentified686Genus 70Capnodiales_unidentified_sp_1697Species 71Capnodiales_Incertae_sedis675Family 72Monticola716Genus 73Monticola_elongata727Species 74Dothideomycetes_unidentified664Order 75Dothideomycetes_unidentified745Family 76Dothideomycetes_unidentified756Genus 77Dothideomycetes_unidentified_sp_1767Species 78Saccharomycetes153Class 79Saccharomycetales784Order 80Saccharomycetales_Incertae_sedis795Family 81Candida806Genus 82Candida_glabrata817Species 83Erysiphales164Order 84Erysiphaceae835Family 85Neoerysiphe846Genus 86Neoerysiphe_nevoi857Species 87Teratosphaeriaceae675Family 88Teratosphaeria876Genus 89Teratosphaeria_sp_1887Species 90Fungi_unidentified_sp_112Species 91Neoerysiphe_kerribeeensis857Species 92Neoerysiphe_galii857Species 93Neoerysiphe_hiratae857Species 94Striatoidium846Genus 95Striatoidium_baccharidis947Species 96Agaricomycetes23Class 97Boletales964Order 98Boletaceae975Family 99Boletus986Genus 100Boletus_bicolor997Species 101Magnaporthales364Order 102Magnaporthaceae1015Family 103Magnaporthaceae_unidentified_sp_11026Species 104Ascomycota_unidentified_sp_1153Species 105Gaeumannomyces1026Genus 106Gaeumannomyces_incrustans1057Species 107Gaeumannomyces_sp_11057Species 108Nakataea1026Genus 109Nakataea_oryzae1087Species 110Sordariomycetes_unidentified_sp_1364Species 111Magnaporthaceae_unidentified_sp_21026Species 112Pyriculariaceae1015Family 113Proxipyricularia1126Genus 114Proxipyricularia_zingiberis1137Species 115Pyricularia1026Genus 116Pyricularia_sp_11157Species 117Eurotiomycetes153Class 118Chaetothyriales1174Order 119Herpotrichiellaceae1185Family 120Phialophora1196Genus 121Phialophora_sp_11207Species 122Magnaporthaceae_unidentified_sp_31026Species 123Pyricularia_sp_21157Species 124Harpophora1026Genus 125Harpophora_oryzae1247Species 126Magnaporthiopsis1026Genus 127Magnaporthiopsis_poae1267Species 128Magnaporthaceae_unidentified_sp_41026Species 129Magnaporthaceae_unidentified_sp_51026Species 130Sordariomycetes_unidentified_sp_2364Species 131Gloeophyllales964Order 132Gloeophyllaceae1315Family 133Veluticeps1326Genus 134Veluticeps_ambigua1337Species 135Polyporales964Order 136Fomitopsidaceae1355Family 137Postia1366Genus 138Postia_undosa1377Species 139Chaetothyriales_unidentified_sp_11185Species 140Hypocreales364Order 141Bionectriaceae1405Family 142Clonostachys1416Genus 143Clonostachys_sp_11427Species 144Lecanoromycetes153Class 145Lecanorales1444Order 146Psoraceae1455Family 147Psora1466Genus 148Psora_decipiens1477Species 149Sordariomycetes_unidentified_sp_3364Species 150Fomitella1366Genus 151Fomitella_supina1507Species 152Helotiales_unidentified_sp_1175Species 153Ascomycota_unidentified_sp_2153Species 154Cordycipitaceae1405Family 155Cordycipitaceae_unidentified_sp_11546Species 156Trechisporales964Order 157Hydnodontaceae1565Family 158Brevicellicium1576Genus 159Brevicellicium_olivascens1587Species 160Fungi_unidentified_sp_212Species 161Peltigerales1444Order 162Pannariaceae1615Family 163Psoroma1626Genus 164Psoroma_fruticulosum1637Species 165Psoroma_buchananii1637Species 166Psoroma_hypnorum_var._paleaceum1637Species 167Phycomycetaceae115Family 168Spinellus1676Genus 169Spinellus_fusiger1687Species 170Gomphales964Order 171Gomphaceae1705Family 172Ramaria1716Genus 173Ramaria_rubribrunnescens1727Species 174Agaricales964Order 175Inocybaceae1745Family 176Inocybe1756Genus 177Inocybe_subnudipes1767Species 178Auriculariales964Order 179Auriculariales_unidentified_sp_11785Species 180Inocybe_leptocystis1767Species 181Ramaria_maculatipes1727Species 182Inocybaceae_unidentified_sp_11756Species 183Agaricales_unidentified_sp_11745Species 184Agaricales_unidentified_sp_21745Species 185Inocybe_lanatopurpurea1767Species 186Inocybaceae_unidentified_sp_21756Species 187Inocybe_sp_11767Species 188Inocybe_sp_21767Species 189Nectriaceae1405Family 190Fusidium1896Genus 191Fusidium_sp_11907Species 192Ostropales1444Order 193Odontotremataceae1925Family 194Geltingia1936Genus 195Geltingia_associata1947Species 196Trechispora1576Genus 197Trechispora_sp_11967Species 198Trechispora_sp_21967Species 199Teloschistales1444Order 200Teloschistaceae1995Family 201Eilifdahlia2006Genus 202Eilifdahlia_dahlii2017Species 203Pleosporales664Order 204Pleosporales_unidentified2035Species 205Trechispora_sp_31967Species 206Trechispora_sp_41967Species 207Trechispora_sp_51967Species 208Agaricomycetes_unidentified_sp_1964Species 209Strophariaceae1745Family 210Agrocybe2096Genus 211Agrocybe_erebia2107Species 212Cystotheca846Genus 213Cystotheca_lanestris2127Species 214Setoidium846Genus 215Setoidium_castanopsidis2147Species 216Agaricostilbomycetes23Class 217Agaricostilbales2164Order 218Chionosphaeraceae2175Family 219Chionosphaera2186Genus 220Chionosphaera_apobasidialis2197Species 221Fungi_unidentified_sp_312Species 222Pholiota2096Genus 223Pholiota_tuberculosa2227Species 224Pholiota_sp_12227Species 225Capronia1196Genus 226Capronia_sp_12257Species

yingeddi2008 commented 7 years ago

Hi Benli,

Thanks for your prompt response. I am now looking for the two scripts you mentioned in the reply, lineage2taxTrain.py and addFullLineage.py. Could you please show me where they are included? Are they in the RDP zipped folder?

Thanks,

Eddi

chaibenl commented 7 years ago

Sorry I thought I attached them. Here they are.

Benli

On Thu, Dec 1, 2016 at 11:21 AM, yingeddi2008 notifications@github.com wrote:

Hi Benli,

Thanks for your prompt response. I am now looking for the two scripts you mentioned in the reply, lineage2taxTrain.py and addFullLineage.py. Could you please show me where they are included? Are they in the RDP zipped folder?

Thanks,

Eddi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-264218427, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAQFRbu8HL78spLI0sWyY_Yqkxa0OKcks5rDvP4gaJpZM4LAiX- .

yingeddi2008 commented 7 years ago

I still don't see them...

rdpstaffmsu commented 7 years ago

Would you send your email address other than the one from github?

Benli

On Thu, Dec 1, 2016 at 1:03 PM, yingeddi2008 notifications@github.com wrote:

I still don't see them...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-264246781, or mute the thread https://github.com/notifications/unsubscribe-auth/AKlEVj8LmrDs0FE7U8C6YLzbE-N3a8JTks5rDwv8gaJpZM4LAiX- .

-- RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842

yingeddi2008 commented 7 years ago

You can send the scripts to hlin2@luc.edu. Thanks.

yingeddi2008 commented 7 years ago

Hi Benli,

Haven't heard from you for the scripts for a while. I'd be really appreciated if you could follow up on this issue.

Thanks a lot!

Eddi

chaibenl commented 7 years ago

Hi, Eddi,

I sent them to you account hlin2@luc.edu 4 days ago. Here I attach them to the email again.

Benli

On Mon, Dec 5, 2016 at 11:20 AM, yingeddi2008 notifications@github.com wrote:

Hi Benli,

Haven't heard from you for the scripts for a while. I'd really appreciated if you could follow up on this issue.

Thanks a lot!

Eddi

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-264898758, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAQFe4Xc6PzpSwNK5mRdfUI6k7NR0y0ks5rFDnDgaJpZM4LAiX- .

Seq_ID Kingdom Phylum Class Order Family Genus Species SH213958.07FU_AF444533_refs Fungi Basidiomycota Microbotryomycetes Sporidiobolales Sporidiobolales_Incertae_sedis Rhodotorula Rhodotorula_diffluens SH213959.07FU_KJ706646_reps Fungi Basidiomycota Microbotryomycetes Sporidiobolales Sporidiobolales_Incertae_sedis Rhodotorula Rhodotorula_sp_1 SH191122.07FU_JN206370_reps Fungi Zygomycota Incertae_sedis Mucorales Mucorales_Incertae_sedis Syzygites Syzygites_megalocarpus SH177358.07FU_Z81447_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Valdensinia Valdensinia_heterodoxa SH177366.07FU_Z80894_reps Fungi Ascomycota Leotiomycetes Helotiales Rutstroemiaceae Rutstroemia Rutstroemia_bolaris SH177367.07FU_AY546074_reps Fungi Ascomycota Leotiomycetes Rhytismatales Rhytismataceae Lophodermium Lophodermium_conigenum SH177368.07FU_AB693917_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Monilinia Monilinia_sp_1 SH177370.07FU_AB026166_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Ciborinia Ciborinia_allii SH177371.07FU_Z73794_reps Fungi Ascomycota Leotiomycetes Helotiales Sclerotiniaceae Monilinia Monilinia_urnula SH177372.07FU_AY645900_reps Fungi Ascomycota Leotiomycetes Helotiales Hemiphacidiaceae Sarcotrochila Sarcotrochila_macrospora SH213382.07FU_JN979417_refs Fungi Ascomycota Sordariomycetes Xylariales Xylariaceae Hypoxylon Hypoxylon_fendleri SH213386.07FU_KM052716_refs Fungi Ascomycota Sordariomycetes Xylariales Xylariaceae Hypoxylon Hypoxylon_sp_1 SH194557.07FU_DQ008233_reps Fungi Ascomycota Leotiomycetes Helotiales Dermateaceae Mollisia Mollisia_sp_1 SH189856.07FU_JQ409283_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_1 SH189859.07FU_JX434665_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_2 SH189860.07FU_HE687084_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_3 SH189861.07FU_GQ985429_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_4 SH189862.07FU_AY969513_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_5 SH189857.07FU_JN102365_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae Peziza Peziza_sp_1 SH189858.07FU_EU554730_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_6 SH189863.07FU_KJ591045_reps Fungi Ascomycota Pezizomycetes Pezizales Pezizaceae - Pezizaceae_unidentified_sp_7 SH174118.07FU_JQ081850_reps Fungi Ascomycota Sordariomycetes Sordariales Sordariales_unidentified Sordariales_unidentified Sordariales_unidentified_sp_1 SH189872.07FU_EU014071_reps Fungi Basidiomycota Pucciniomycetes Pucciniales Uropyxidaceae Tranzschelia Tranzschelia_discolor SH206047.07FU_AY559338_reps Fungi Ascomycota Dothideomycetes Capnodiales Capnodiales_unidentified Capnodiales_unidentified Capnodiales_unidentified_sp_1 SH206048.07FU_KF309965_reps Fungi Ascomycota Dothideomycetes Capnodiales Capnodiales_Incertae_sedis Monticola Monticola_elongata SH206049.07FU_AY843042_reps Fungi Ascomycota Dothideomycetes Dothideomycetes_unidentified Dothideomycetes_unidentified Dothideomycetes_unidentified Dothideomycetes_unidentified_sp_1 SH206053.07FU_JN942642_reps Fungi Ascomycota Saccharomycetes Saccharomycetales Saccharomycetales_Incertae_sedis Candida Candida_glabrata SH194562.07FU_AB498974_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_nevoi SH206057.07FU_JN709043_reps Fungi Ascomycota Dothideomycetes Capnodiales Teratosphaeriaceae Teratosphaeria Teratosphaeria_sp_1 SH206058.07FU_GU721292_reps Fungi - - - - - Fungi_unidentified_sp_1 SH194564.07FU_GU356546_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_kerribeeensis SH194565.07FU_AB329681_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_galii SH194563.07FU_AB498962_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Neoerysiphe Neoerysiphe_hiratae SH194567.07FU_AB329684_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Striatoidium Striatoidium_baccharidis SH174173.07FU_FJ362291_reps Fungi Basidiomycota Agaricomycetes Boletales Boletaceae Boletus Boletus_bicolor SH174125.07FU_FN555109_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_1 SH174126.07FU_FJ541434_reps Fungi Ascomycota - - - - Ascomycota_unidentified_sp_1 SH174127.07FU_JF414846_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Gaeumannomyces Gaeumannomyces_incrustans SH174128.07FU_KJ855489_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Gaeumannomyces Gaeumannomyces_sp_1 SH174133.07FU_DQ528792_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Nakataea Nakataea_oryzae SH174134.07FU_FJ430720_reps Fungi Ascomycota Sordariomycetes - - - Sordariomycetes_unidentified_sp_1 SH174135.07FU_KJ855505_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_2 SH174147.07FU_AB274433_reps Fungi Ascomycota Sordariomycetes Magnaporthales Pyriculariaceae Proxipyricularia Proxipyricularia_zingiberis SH174148.07FU_AB512785_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Pyricularia Pyricularia_sp_1 SH174137.07FU_AJ132542_reps Fungi Ascomycota Eurotiomycetes Chaetothyriales Herpotrichiellaceae Phialophora Phialophora_sp_1 SH174138.07FU_KJ855487_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_3 SH174140.07FU_AB818016_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Pyricularia Pyricularia_sp_2 SH174141.07FU_EU636699_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Harpophora Harpophora_oryzae SH174142.07FU_JX134600_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae Magnaporthiopsis Magnaporthiopsis_poae SH174143.07FU_KJ855497_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_4 SH174145.07FU_EU144817_reps Fungi Ascomycota Sordariomycetes Magnaporthales Magnaporthaceae - Magnaporthaceae_unidentified_sp_5 SH174146.07FU_KC354577_reps Fungi Ascomycota Sordariomycetes - - - Sordariomycetes_unidentified_sp_2 SH199540.07FU_KC414241_reps Fungi Basidiomycota Agaricomycetes Gloeophyllales Gloeophyllaceae Veluticeps Veluticeps_ambigua SH199543.07FU_UDB016415_refs Fungi Basidiomycota Agaricomycetes Polyporales Fomitopsidaceae Postia Postia_undosa SH177464.07FU_GU055939_reps Fungi Ascomycota Eurotiomycetes Chaetothyriales - - Chaetothyriales_unidentified_sp_1 SH206064.07FU_HQ022506_reps Fungi Ascomycota Sordariomycetes Hypocreales Bionectriaceae Clonostachys Clonostachys_sp_1 SH206065.07FU_AY425633_reps Fungi Ascomycota Lecanoromycetes Lecanorales Psoraceae Psora Psora_decipiens SH206066.07FU_KF823600_reps Fungi Ascomycota Sordariomycetes - - - Sordariomycetes_unidentified_sp_3 SH199552.07FU_KF274644_refs Fungi Basidiomycota Agaricomycetes Polyporales Fomitopsidaceae Fomitella Fomitella_supina SH174175.07FU_JF449882_reps Fungi Ascomycota Leotiomycetes Helotiales - - Helotiales_unidentified_sp_1 SH206068.07FU_GU054276_reps Fungi Ascomycota - - - - Ascomycota_unidentified_sp_2 SH174177.07FU_JX192683_reps Fungi Ascomycota Sordariomycetes Hypocreales Cordycipitaceae - Cordycipitaceae_unidentified_sp_1 SH199558.07FU_HE963782_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Brevicellicium Brevicellicium_olivascens SH199559.07FU_HE963789_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Brevicellicium Brevicellicium_olivascens SH206069.07FU_JN811088_reps Fungi - - - - - Fungi_unidentified_sp_2 SH174183.07FU_GQ927301_reps Fungi Ascomycota Lecanoromycetes Peltigerales Pannariaceae Psoroma Psoroma_fruticulosum SH174184.07FU_GQ927299_reps Fungi Ascomycota Lecanoromycetes Peltigerales Pannariaceae Psoroma Psoroma_buchananii SH174185.07FU_GQ927305_reps Fungi Ascomycota Lecanoromycetes Peltigerales Pannariaceae Psoroma Psoroma_hypnorum_var._paleaceum SH223384.07FU_JN206297_reps Fungi Zygomycota Incertae_sedis Mucorales Phycomycetaceae Spinellus Spinellus_fusiger SH206073.07FU_JX310406_reps Fungi Basidiomycota Agaricomycetes Gomphales Gomphaceae Ramaria Ramaria_rubribrunnescens SH174196.07FU_UDB015353_refs Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_subnudipes SH174240.07FU_EF434113_reps Fungi Basidiomycota Agaricomycetes Auriculariales - - Auriculariales_unidentified_sp_1 SH174205.07FU_AM882801_refs Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_leptocystis SH206074.07FU_EU669323_reps Fungi Basidiomycota Agaricomycetes Gomphales Gomphaceae Ramaria Ramaria_maculatipes SH174194.07FU_UDB004943_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae - Inocybaceae_unidentified_sp_1 SH174195.07FU_HE687059_reps Fungi Basidiomycota Agaricomycetes Agaricales - - Agaricales_unidentified_sp_1 SH174198.07FU_HF565068_reps Fungi Basidiomycota Agaricomycetes Agaricales - - Agaricales_unidentified_sp_2 SH174199.07FU_KJ432291_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_lanatopurpurea SH174200.07FU_JQ975963_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae - Inocybaceae_unidentified_sp_2 SH174202.07FU_JF908177_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_sp_1 SH174203.07FU_JF908158_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_leptocystis SH174204.07FU_FR852254_reps Fungi Basidiomycota Agaricomycetes Agaricales Inocybaceae Inocybe Inocybe_sp_2 SH174242.07FU_KF359560_reps Fungi Ascomycota Sordariomycetes Hypocreales Nectriaceae Fusidium Fusidium_sp_1 SH174229.07FU_UDB015045_reps Fungi Ascomycota Lecanoromycetes Ostropales Odontotremataceae Geltingia Geltingia_associata SH199562.07FU_JF300723_refs Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_1 SH199561.07FU_AY969490_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_2 SH206078.07FU_KJ021221_reps Fungi Ascomycota Lecanoromycetes Teloschistales Teloschistaceae Eilifdahlia Eilifdahlia_dahlii SH174235.07FU_JX448358_reps Fungi Ascomycota Dothideomycetes Pleosporales - - Pleosporales_unidentified SH199563.07FU_KF718212_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_3 SH199564.07FU_HM030587_reps Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_4 SH199565.07FU_JF519114_refs Fungi Basidiomycota Agaricomycetes Trechisporales Hydnodontaceae Trechispora Trechispora_sp_5 SH206080.07FU_KC478560_reps Fungi Basidiomycota Agaricomycetes - - - Agaricomycetes_unidentified_sp_1 SH177453.07FU_JN020964_reps Fungi Basidiomycota Agaricomycetes Agaricales Strophariaceae Agrocybe Agrocybe_erebia SH174249.07FU_AF011289_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Cystotheca Cystotheca_lanestris SH174251.07FU_AB743781_refs Fungi Ascomycota Leotiomycetes Erysiphales Erysiphaceae Setoidium Setoidium_castanopsidis SH177458.07FU_AF444599_refs Fungi Basidiomycota Agaricostilbomycetes Agaricostilbales Chionosphaeraceae Chionosphaera Chionosphaera_apobasidialis SH212541.07FU_KC884399_reps Fungi - - - - - Fungi_unidentified_sp_3 SH177459.07FU_UDB015324_reps Fungi Basidiomycota Agaricomycetes Agaricales Strophariaceae Pholiota Pholiota_tuberculosa SH177460.07FU_FJ596817_reps Fungi Basidiomycota Agaricomycetes Agaricales Strophariaceae Pholiota Pholiota_sp_1 SH177463.07FU_EU139156_reps Fungi Ascomycota Eurotiomycetes Chaetothyriales Herpotrichiellaceae Capronia Capronia_sp_1

0Root-10rootrank 1Fungi01Kingdom 2Basidiomycota12Phylum 3Microbotryomycetes23Class 4Sporidiobolales34Order 5Sporidiobolales_Incertae_sedis45Family 6Rhodotorula56Genus 7Rhodotorula_diffluens67Species 8Rhodotorula_sp_167Species 9Zygomycota12Phylum 10Incertae_sedis93Class 11Mucorales104Order 12Mucorales_Incertae_sedis115Family 13Syzygites126Genus 14Syzygites_megalocarpus137Species 15Ascomycota12Phylum 16Leotiomycetes153Class 17Helotiales164Order 18Sclerotiniaceae175Family 19Valdensinia186Genus 20Valdensinia_heterodoxa197Species 21Rutstroemiaceae175Family 22Rutstroemia216Genus 23Rutstroemia_bolaris227Species 24Rhytismatales164Order 25Rhytismataceae245Family 26Lophodermium256Genus 27Lophodermium_conigenum267Species 28Monilinia186Genus 29Monilinia_sp_1287Species 30Ciborinia186Genus 31Ciborinia_allii307Species 32Monilinia_urnula287Species 33Hemiphacidiaceae175Family 34Sarcotrochila336Genus 35Sarcotrochila_macrospora347Species 36Sordariomycetes153Class 37Xylariales364Order 38Xylariaceae375Family 39Hypoxylon386Genus 40Hypoxylon_fendleri397Species 41Hypoxylon_sp_1397Species 42Dermateaceae175Family 43Mollisia426Genus 44Mollisia_sp_1437Species 45Pezizomycetes153Class 46Pezizales454Order 47Pezizaceae465Family 48Pezizaceae_unidentified_sp_1476Species 49Pezizaceae_unidentified_sp_2476Species 50Pezizaceae_unidentified_sp_3476Species 51Pezizaceae_unidentified_sp_4476Species 52Pezizaceae_unidentified_sp_5476Species 53Peziza476Genus 54Peziza_sp_1537Species 55Pezizaceae_unidentified_sp_6476Species 56Pezizaceae_unidentified_sp_7476Species 57Sordariales364Order 58Sordariales_unidentified575Family 59Sordariales_unidentified586Genus 60Sordariales_unidentified_sp_1597Species 61Pucciniomycetes23Class 62Pucciniales614Order 63Uropyxidaceae625Family 64Tranzschelia636Genus 65Tranzschelia_discolor647Species 66Dothideomycetes153Class 67Capnodiales664Order 68Capnodiales_unidentified675Family 69Capnodiales_unidentified686Genus 70Capnodiales_unidentified_sp_1697Species 71Capnodiales_Incertae_sedis675Family 72Monticola716Genus 73Monticola_elongata727Species 74Dothideomycetes_unidentified664Order 75Dothideomycetes_unidentified745Family 76Dothideomycetes_unidentified756Genus 77Dothideomycetes_unidentified_sp_1767Species 78Saccharomycetes153Class 79Saccharomycetales784Order 80Saccharomycetales_Incertae_sedis795Family 81Candida806Genus 82Candida_glabrata817Species 83Erysiphales164Order 84Erysiphaceae835Family 85Neoerysiphe846Genus 86Neoerysiphe_nevoi857Species 87Teratosphaeriaceae675Family 88Teratosphaeria876Genus 89Teratosphaeria_sp_1887Species 90Fungi_unidentified_sp_112Species 91Neoerysiphe_kerribeeensis857Species 92Neoerysiphe_galii857Species 93Neoerysiphe_hiratae857Species 94Striatoidium846Genus 95Striatoidium_baccharidis947Species 96Agaricomycetes23Class 97Boletales964Order 98Boletaceae975Family 99Boletus986Genus 100Boletus_bicolor997Species 101Magnaporthales364Order 102Magnaporthaceae1015Family 103Magnaporthaceae_unidentified_sp_11026Species 104Ascomycota_unidentified_sp_1153Species 105Gaeumannomyces1026Genus 106Gaeumannomyces_incrustans1057Species 107Gaeumannomyces_sp_11057Species 108Nakataea1026Genus 109Nakataea_oryzae1087Species 110Sordariomycetes_unidentified_sp_1364Species 111Magnaporthaceae_unidentified_sp_21026Species 112Pyriculariaceae1015Family 113Proxipyricularia1126Genus 114Proxipyricularia_zingiberis1137Species 115Pyricularia1026Genus 116Pyricularia_sp_11157Species 117Eurotiomycetes153Class 118Chaetothyriales1174Order 119Herpotrichiellaceae1185Family 120Phialophora1196Genus 121Phialophora_sp_11207Species 122Magnaporthaceae_unidentified_sp_31026Species 123Pyricularia_sp_21157Species 124Harpophora1026Genus 125Harpophora_oryzae1247Species 126Magnaporthiopsis1026Genus 127Magnaporthiopsis_poae1267Species 128Magnaporthaceae_unidentified_sp_41026Species 129Magnaporthaceae_unidentified_sp_51026Species 130Sordariomycetes_unidentified_sp_2364Species 131Gloeophyllales964Order 132Gloeophyllaceae1315Family 133Veluticeps1326Genus 134Veluticeps_ambigua1337Species 135Polyporales964Order 136Fomitopsidaceae1355Family 137Postia1366Genus 138Postia_undosa1377Species 139Chaetothyriales_unidentified_sp_11185Species 140Hypocreales364Order 141Bionectriaceae1405Family 142Clonostachys1416Genus 143Clonostachys_sp_11427Species 144Lecanoromycetes153Class 145Lecanorales1444Order 146Psoraceae1455Family 147Psora1466Genus 148Psora_decipiens1477Species 149Sordariomycetes_unidentified_sp_3364Species 150Fomitella1366Genus 151Fomitella_supina1507Species 152Helotiales_unidentified_sp_1175Species 153Ascomycota_unidentified_sp_2153Species 154Cordycipitaceae1405Family 155Cordycipitaceae_unidentified_sp_11546Species 156Trechisporales964Order 157Hydnodontaceae1565Family 158Brevicellicium1576Genus 159Brevicellicium_olivascens1587Species 160Fungi_unidentified_sp_212Species 161Peltigerales1444Order 162Pannariaceae1615Family 163Psoroma1626Genus 164Psoroma_fruticulosum1637Species 165Psoroma_buchananii1637Species 166Psoroma_hypnorum_var._paleaceum1637Species 167Phycomycetaceae115Family 168Spinellus1676Genus 169Spinellus_fusiger1687Species 170Gomphales964Order 171Gomphaceae1705Family 172Ramaria1716Genus 173Ramaria_rubribrunnescens1727Species 174Agaricales964Order 175Inocybaceae1745Family 176Inocybe1756Genus 177Inocybe_subnudipes1767Species 178Auriculariales964Order 179Auriculariales_unidentified_sp_11785Species 180Inocybe_leptocystis1767Species 181Ramaria_maculatipes1727Species 182Inocybaceae_unidentified_sp_11756Species 183Agaricales_unidentified_sp_11745Species 184Agaricales_unidentified_sp_21745Species 185Inocybe_lanatopurpurea1767Species 186Inocybaceae_unidentified_sp_21756Species 187Inocybe_sp_11767Species 188Inocybe_sp_21767Species 189Nectriaceae1405Family 190Fusidium1896Genus 191Fusidium_sp_11907Species 192Ostropales1444Order 193Odontotremataceae1925Family 194Geltingia1936Genus 195Geltingia_associata1947Species 196Trechispora1576Genus 197Trechispora_sp_11967Species 198Trechispora_sp_21967Species 199Teloschistales1444Order 200Teloschistaceae1995Family 201Eilifdahlia2006Genus 202Eilifdahlia_dahlii2017Species 203Pleosporales664Order 204Pleosporales_unidentified2035Species 205Trechispora_sp_31967Species 206Trechispora_sp_41967Species 207Trechispora_sp_51967Species 208Agaricomycetes_unidentified_sp_1964Species 209Strophariaceae1745Family 210Agrocybe2096Genus 211Agrocybe_erebia2107Species 212Cystotheca846Genus 213Cystotheca_lanestris2127Species 214Setoidium846Genus 215Setoidium_castanopsidis2147Species 216Agaricostilbomycetes23Class 217Agaricostilbales2164Order 218Chionosphaeraceae2175Family 219Chionosphaera2186Genus 220Chionosphaera_apobasidialis2197Species 221Fungi_unidentified_sp_312Species 222Pholiota2096Genus 223Pholiota_tuberculosa2227Species 224Pholiota_sp_12227Species 225Capronia1196Genus 226Capronia_sp_12257Species

yingeddi2008 commented 7 years ago

Thanks Benli, I received them.

yingeddi2008 commented 7 years ago

Hi Benli,

I am trying the scripts you provided in the email to re-train RDP classifier using NCBI 16s database, but I encountered some error messages when I use the files generated by your scripts to train.

I have generated the fasta file with lineage added to the sequence ID, and you can download from https://www.dropbox.com/s/86uqecg3iflrom5/16SMicrobial.ready4train.fasta?dl=0

I also have the taxonomy file in RDP compatible format, and you can download from https://www.dropbox.com/s/rnmw2izjdsdc39f/16SMicrobial.ready4train.taxonomy?dl=0.

When I tried to train using the following command: (I am using 2.12 version) java -Xmx1g -jar /Users/huaiyinglin/Downloads/rdp_classifier_2.12/dist/classifier.jar train -o 16S_ncbi -s 16SMicrobial.ready4train.fasta -t 16SMicrobial.ready4train.taxonomy

Error Messages like the following appears:

edu.msu.cme.rdp.classifier.train.NameRankDupException: Error: duplicate taxon name and rank in the taxonomy file. ponticoccus genus 2 at edu.msu.cme.rdp.classifier.train.TreeFactory.creatTaxidMap(TreeFactory.java:126) at edu.msu.cme.rdp.classifier.train.TreeFactory.(TreeFactory.java:61) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.(ClassifierTraineeMaker.java:63) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(ClassifierTraineeMaker.java:170) at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.java:77)

I have looked up the genus name "ponticoccus" listed as part of the error message, I did find three entries for ponticoccus at genus level, but for three different species. Since I want to train at species level, when I made the taxonomy file, I made sure there is no duplicated taxonomy information, so each sequence should be unique taxonomy-wise on species level.

It seems to me that the RDP classifier can only be trained on genus level even after I provided Species level information. Could you please help me figure out how I can train at species level?

Thanks a lot in advance!

Eddi

rdpstaffmsu commented 7 years ago

I don't have your tab-delimited (raw) taxonomy file to point to you, but I see genus name "Ponticoccus" appears under two different families:

125Propionibacteriaceae1244Family 177Rhodobacteraceae1764Family

Remember 'convergent' evolution is not allowed here!

Benli

On Mon, Dec 5, 2016 at 2:50 PM, yingeddi2008 notifications@github.com wrote:

Hi Benli,

I am trying the scripts you provided in the email to re-train RDP classifier using NCBI 16s database, but I encountered some error messages when I use the files generated by your scripts to train.

I have generated the fasta file with lineage added to the sequence ID, and you can download from https://www.dropbox.com/s/ 86uqecg3iflrom5/16SMicrobial.ready4train.fasta?dl=0

I also have the taxonomy file in RDP compatible format, and you can download from https://www.dropbox.com/s/rnmw2izjdsdc39f/16SMicrobial. ready4train.taxonomy?dl=0.

When I tried to train using the following command: (I am using 2.12 version) java -Xmx1g -jar /Users/huaiyinglin/Downloads/rdp_classifier_2.12/dist/classifier.jar train -o 16S_ncbi -s 16SMicrobial.ready4train.fasta -t 16SMicrobial.ready4train.taxonomy

Error Messages like the following appears:

edu.msu.cme.rdp.classifier.train.NameRankDupException: Error: duplicate taxon name and rank in the taxonomy file. ponticoccus genus 2 at edu.msu.cme.rdp.classifier.train.TreeFactory.creatTaxidMap(TreeFactory. java:126) at edu.msu.cme.rdp.classifier.train.TreeFactory.(TreeFactory.java:61) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.( ClassifierTraineeMaker.java:63) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker. main(ClassifierTraineeMaker.java:170) at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main( ClassifierMain.java:77)

I have looked up the genus name "ponticoccus" listed as part of the error message, I couldn't find there is any duplication of this genus.When I made the taxonomy file, I made sure there is no duplicated taxonomy information, so each sequence should be unique taxonomy-wise. Could you please help me figure out what the problem is here?

Thanks a lot in advance!

Eddi

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-264957568, or mute the thread https://github.com/notifications/unsubscribe-auth/AKlEVjq_FJzvDehp3NFcADILcKvm8wThks5rFGr8gaJpZM4LAiX- .

-- RDP Staff Ribosomal Database Project Center for Microbial Ecology Michigan State University 567 Wilson Rd. Room 2225 A East Lansing, MI 48824 (517) 353-3842

yingeddi2008 commented 7 years ago

Thanks Benli, I see where the problem is. I will remove any convergent evolution and try again.

AnnyYoung commented 7 years ago

Hi Eddi,

I met some problem about rdp_classifier-2.4.jar, I already checked the .fasta and taxonomy.txt's format like you said in "How to format the taxonomy file to retrain classifier #18" , but I get the same error information , so I want to try lineage2taxTrain.py and addFullLineage.py . Clould you give me this two script please?

the error information like this: Exception in thread "main" java.lang.IllegalArgumentException: Illegal taxonomy format at 3260*32597*genus at edu.msu.cme.rdp.classifier.train.TreeFactory.creatTaxidMap(TreeFactory.java:79) at edu.msu.cme.rdp.classifier.train.TreeFactory.(TreeFactory.java:58) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.(ClassifierTraineeMaker.java:40) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(ClassifierTraineeMaker.java:131)

Thanks a lot!

anny

TurbulentCupcake commented 7 years ago

Hi RDP Staff,

Can you pass on the script used to create the taxonomy file mentioned earlier in the thread? I would greatly appreciate it.

Thanks, Adithya

chaibenl commented 7 years ago

Hi, Adithya,

Here they are.

Benli Chai RDP Staff

On Thu, Jun 22, 2017 at 10:54 PM, Adithya Murali notifications@github.com wrote:

Hi RDP Staff,

Can you pass on the script used to create the taxonomy file mentioned earlier in the thread? I would greatly appreciate it.

Thanks, Adithya

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-310557417, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAQFb3D8TfmnRz48oGqdoO_vLuPZFcsks5sGyj-gaJpZM4LAiX- .

TurbulentCupcake commented 7 years ago

Hi,

Thanks for the response, but I am unable to see them. Can you forward them to muraliadithya315@gmail.com?

Thanks, Adithya

jbholm commented 6 years ago

Hi, I'm looking for the following scripts for re-training the classifier w/a new lineage. Are these publicly-available some place or must be they emailed? If so, my email is jholm@som.umaryland.edu

Thanks!

lineage2taxTrain.py addFullLineage.py

xysswang commented 6 years ago

Hi RDP Team,

I would like to create my own training data. Could you also send me the scripts, lineage2taxTrain.py and addFullLineage.py ? I will really appreciate that. My email address is tingting.zheng@hku.hk

Thanks !

AmeLaporte commented 6 years ago

Hello RDP Team, I also am interested in the scripts to generate my training set. Is it possible to receive them at my email address: amelie.lpe@sfr.fr

Thanks! Amélie

mbenucci commented 6 years ago

Hi all, I came across this issue recently, similarly to many of other users, while in the process of testing the classifier and in the process of training it with our own reference sequence files. I think I managed to find the above mentioned python scripts that allows for correct formatting of the files for retraining the RDP classifier. I cloned them and checked the scripts to make sure they were doing what I thought they were supposed to do...and ultimately they seem to be working fine for me.

https://github.com/GLBRC-TeamMicrobiome/python_scripts.git

I hope this helps others as well. Marco

chaibenl commented 6 years ago

Those taxonomy and sequence files were just examples. You need to create your own taxonomy file for the sequences you chose as the training set.

Benli

On Tue, May 22, 2018 at 3:18 PM, jbholm notifications@github.com wrote:

I'm getting an new error when using the provided example data and scripts:

addFullLineage.py ready4train_taxonomy.txt rawSeq.fasta SH213958.07FU_AF444533_refs not in taxonomy file

It doesn't seem that the ready4train_taxonomy file contains seqIDs. But this was provided by you and worked for me a few months ago. What am I missing?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-391109476, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAQFebmm2It7nWZU6lWgQZsbV3ovzWbks5t1GSBgaJpZM4LAiX- .

jbholm commented 6 years ago

Yes, I was having the issue with my own training set, so attempted the scripts on the example data to ensure it wasn’t my data causing the issue.

I determined the problem. In the example provided it said to use:

addFullLineage.py ready4train_taxonomy.txt rawSeq.fasta > ready4train_seqs.fa

But one has to use the raw taxonomy, not the ready for train taxonomy

addFullLineage.py RawTaxonomy.txt rawSeq.fasta > ready4train_seqs.fa

However, now I am having trouble training the classifier as it says that the root for some taxa is not found.

Thanks for responsing quickly, ~Johanna

On May 22, 2018, at 3:25 PM, chaibenl notifications@github.com<mailto:notifications@github.com> wrote:

CAUTION: This message originated from a non UMB, UMSOM, FPI, or UMMS email system. Whether the sender is known or not known, hover over any links before clicking and use caution opening attachments.

Those taxonomy and sequence files were just examples. You need to create your own taxonomy file for the sequences you chose as the training set.

Benli

On Tue, May 22, 2018 at 3:18 PM, jbholm notifications@github.com<mailto:notifications@github.com> wrote:

I'm getting an new error when using the provided example data and scripts:

addFullLineage.py ready4train_taxonomy.txt rawSeq.fasta SH213958.07FU_AF444533_refs not in taxonomy file

It doesn't seem that the ready4train_taxonomy file contains seqIDs. But this was provided by you and worked for me a few months ago. What am I missing?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-391109476, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAQFebmm2It7nWZU6lWgQZsbV3ovzWbks5t1GSBgaJpZM4LAiX- .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/rdpstaff/classifier/issues/18#issuecomment-391111427, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQX3TeJ2cBB1uXqH_y3OZAZ-5-SEi3uwks5t1GYmgaJpZM4LAiX-.

anaya1 commented 6 years ago

Dear Benli, Would it be possible to provide me with your scripts: lineage2taxTrain.py and addFullLineage.py. Highly appreciate your help.

Anna Alessi (as573@york.ac.uk)

anaya1 commented 6 years ago

Dear Benli, Thank you for providing the scripts. I have successfully created a ready4train_taxonomy.txt file for my database. However when I want to add lineage to my rawSeq.fasta the output file says "AB001438 not in taxonomy file". What do you think causing it? I have checked taxonomy and seq files and they both contain AB001438. In fact this is a first entry in both files. Many thanks, Anna

anaya1 commented 6 years ago

Hi Benli, I know why I had a previous issue. You must use: python addFullLineage.py rawtax.txt rawSeg.fasta > ready4train_seq.fasta. Now however I have another problem while trying to train my database: Exception in thread "main" java.lang.IllegalArgumentException: Sequence AY230195 has different lowest rank: Genus from the previous lowest rank: Species at edu.msu.cme.rdp.classifier.train.TreeFactory.addSequencewithLineage(T reeFactory.java:278) at edu.msu.cme.rdp.classifier.train.TreeFactory.parseSequenceFile(TreeFa ctory.java:152) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.(Classi fierTraineeMaker.java:65) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(Classifi erTraineeMaker.java:170) at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.jav a:77) I was wondering if you could comment on it and help me to solve this problem. Thanks, Anna

chaibenl commented 6 years ago

Hi, Anna,

The scripts does not check the consistency of your "rawtax.txt" file. You need to do it following the note file. For example, are all the sequences labeled to the same terminal rank (species or genus)? Any taxa share the same name, e.g. 'sp'?

Benli

On Mon, Jun 4, 2018 at 2:16 PM, Anna Alessi notifications@github.com wrote:

Hi Benli, I know why I had a previous issue. You must use: python addFullLineage.py rawtax.txt rawSeg.fasta > ready4train_seq.fasta. Now however I have another problem while trying to train my database: Exception in thread "main" java.lang.IllegalArgumentException: Sequence AY230195 has different lowest rank: Genus from the previous lowest rank: Species at edu.msu.cme.rdp.classifier.train.TreeFactory.addSequencewithLineage(T reeFactory.java:278) at edu.msu.cme.rdp.classifier.train.TreeFactory.parseSequenceFile(TreeFa ctory.java:152) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.(Classi fierTraineeMaker.java:65) at edu.msu.cme.rdp.classifier.train.ClassifierTraineeMaker.main(Classifi erTraineeMaker.java:170) at edu.msu.cme.rdp.classifier.cli.ClassifierMain.main(ClassifierMain.jav a:77) I was wondering if you could comment on it and help me to solve this problem. Thanks, Anna

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdpstaff/classifier/issues/18#issuecomment-394449247, or mute the thread https://github.com/notifications/unsubscribe-auth/AQAQFR7g1_VKAd_vUfE82-PgmbjCV40Wks5t5XmCgaJpZM4LAiX- .

rwst commented 6 years ago

@yingeddi2008

I am not sure I am getting what's the rule of constructing the taxonomy file here. Could you please explain how this is done?

I'm pretty sure that the second number is the level, i.e. depth of the tree node, not another id.

rwst commented 6 years ago

@rdpstaffmsu Is the copynumber file required? If so, can you please add information on what exactly should the content be?

andrewmaltezthomas commented 5 years ago

Dear @rdpstaffmsu

Could you send the python scripts:

lineage2taxTrain.py

addFullLineage.py

To my email address:

andrewmaltezthomas@gmail.com

Thanks

050114dragon commented 5 years ago

Dear @rdpstaffmsu

if you will send me lineage2taxTrain.py and addFullLineage.py, I shall be very grateful, my email is 050114dragon@163.com.

Thanks

wangchao-malab commented 4 years ago

Dear Benli, Would it be possible to provide me with your scripts: lineage2taxTrain.py and addFullLineage.py. Highly appreciate your help. My email is: siraowang@foxmail.com

ghost commented 4 years ago

Dear @rdpstaffmsu

Can you please send me a copy of lineage2taxTrain.py and addFullLineage.py? My email is carterhoffman@gmail.com

Thanks

wangchao-malab commented 4 years ago

Hello,

Please find attached scripts.

Good luck!

Wang chao

siraowang@foxmail.com

From: CarterHoffman Date: 2019-11-02 08:12 To: rdpstaff/classifier CC: devil-imcas; Comment Subject: Re: [rdpstaff/classifier] How to format the taxonomy file to retrain classifier (#18) Dear @rdpstaffmsu Can you please send me a copy of lineage2taxTrain.py and addFullLineage.py? My email is carterhoffman@gmail.com Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ghost commented 4 years ago

Hello,

Thanks for your response. The university email associated with my github account automatically rejects any attachments with code in them. Could you please resend the scripts to my gmail account carterhoffman@gmail.com?

Thanks for your help, Carter

LIU3379 commented 3 years ago

Hi RDP Team,

I would like to create my own training data. Could you also send me the scripts, lineage2taxTrain.py and addFullLineage.py ? I will really appreciate that. My email address is 17863953379@163.com

Thanks !

ctb commented 3 years ago

note to the RDP team: you can attach the files to this GitHub issue by renaming them as .txt files and adding them to this issue on the web interface, if you like. Or if you send them to me at ctbrown@ucdavis.edu I can do that for you :)