nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
322 stars 87 forks source link

interproscan_docker.sh failure #120

Closed gskrasnov closed 6 years ago

gskrasnov commented 6 years ago

Dear Jon

I would like to report a bug

I tried to run Interproscan, as it was suggested with your fuannotate update:

 sudo /home/linuxbrew/.linuxbrew/Cellar/funannotate/0.7.2/libexec/util/interproscan_docker.sh -i=901.fun_out.wo.RepMod/update_results/Linum_usitatissimum.proteins.fa -c=60

I installed docker, downloaded DBs, ran the command and got the following:

......
ERROR - Execution thrown when attempting to executeInTransaction the StepExecution.  All database activity rolled back.
java.lang.IllegalArgumentException: You have submitted a protein sequence which contains an asterix (*). This may be from an ORF prediction program. '*' is not a valid IUPAC amino acid character and amino acid sequences which go through our pipeline should not contain it. Please strip out all asterix characters from your sequence and resubmit your search.
.......

Maybe I should manually exclude "*" symbols from Linum_usitatissimum.proteins.fa and submit them to interproscan?

gskrasnov commented 6 years ago

I replaced "*" with "X" and everything seems to be fine... About 3100 "*"-stop codons are present in Linum_usitatissimum.proteins.fa (funannotate update output)

nextgenusfs commented 6 years ago

Okay thanks for letting me know, are those internal stops? If so, probably need to do some better filtering after funannotate update.

gskrasnov commented 6 years ago

Yes, interproscan has stopped at "-" symbol. I cleaned the predicted protein sequences and eliminated all non-alphabetic symbols by replacing them with X. Everything seems to be OK. It's running now. Already 3 or 4 hrs @64cpu. I mentioned that interproscan runs an excessive amount of analyses:

[CDD-3.14,Coils-2.2.1,Gene3D-3.5.0,Hamap-201605.11,MobiDBLite-1.0,PANTHER-11.1,Pfam-30.0,PIRSF-3.01,PRINTS-42.0,ProDom-2006.1,ProSitePatterns-20.119,ProSiteProfiles-20.119,SFLD-2,SignalP_EUK-4.1,SignalP_GRAM_NEGATIVE-4.1,SignalP_GRAM_POSITIVE-4.1,SMART-7.1,SUPERFAMILY-1.75,TIGRFAM-15.0,TMHMM-2.0c]

nextgenusfs commented 6 years ago

This should be fixed in v1.0.0 --> was a bug in the phase (codon_start) in some partial gene models.

mictadlo commented 5 years ago

To replace it with X seems not to be good idea https://github.com/Gaius-Augustus/BRAKER/issues/56#issuecomment-510767073