liberjul / CONSTAXv2

MIT License
8 stars 2 forks source link

Custom Database #12

Closed pbraileyjones closed 10 months ago

pbraileyjones commented 10 months ago

Hi- I'm trying to classify my sequences using a custom LSU database but I am running into some errors that I'm not sure how to solve.

I've ostensibly formatted my database headers in the UNITE format eg.:

Schizosaccharomyces_pombe_Asco|Z19136.1|kFungi;pAscomycota;cSchizosaccharomycetes;oSchizosaccharomycetales;fSchizosaccharomycetaceae;gSchizosaccharomyces;s__pombe

Candida_glabrata_Asco|AY198398.1|kFungi;pAscomycota;cSaccharomycetes;oSaccharomycetales;fSaccharomycetaceae;gCandida;s__glabrata

This is the code I am running:

constax -i otu97repseqs_clean_AMFonly.fasta \ -d AMFDB_UNITEFORMAT.fasta \ -c 0.8 -b -t -n 120

And this is the output I am getting

Reformatting database \ UNITE format detected \ Traceback (most recent call last): \ File "/apps/eb/constax/2.0.17/opt/constax-2.0.17-0/FormatRefDB.py", line 88, in \ temp2 = temp[4].strip().split("__") \ IndexError: list index out of range

Do you have any idea why I might be coming up with this error?

Thanks! Phil

liberjul commented 10 months ago

Hi @pbraileyjones ,

The formatting script splits the header at |, and takes the 5th element for taxonomy. You should just need to add two extra |, like in this header:

>Entoloma_flavidum|JQ281481|SH1510137.08FU|refs|k__Fungi;p__Basidiomycota;c__Agaricomycetes;o__Agaricales;f__Entolomataceae;g__Entoloma;s__Entoloma_flavidum

For example:

>Candida_glabrata_Asco|AY198398.1|XXX|reps|k__Fungi;p__Ascomycota;c__Saccharomycetes;o__Saccharomycetales;f__Saccharomycetaceae;g__Candida;s__glabrata

I hope that works,

Julian

pbraileyjones commented 10 months ago

Thanks! That was it!

Best, Phil