Closed gskrasnov closed 6 years ago
I replaced "*" with "X" and everything seems to be fine... About 3100 "*"-stop codons are present in Linum_usitatissimum.proteins.fa (funannotate update output)
Okay thanks for letting me know, are those internal stops? If so, probably need to do some better filtering after funannotate update
.
Yes, interproscan has stopped at "-" symbol. I cleaned the predicted protein sequences and eliminated all non-alphabetic symbols by replacing them with X. Everything seems to be OK. It's running now. Already 3 or 4 hrs @64cpu. I mentioned that interproscan runs an excessive amount of analyses:
[CDD-3.14,Coils-2.2.1,Gene3D-3.5.0,Hamap-201605.11,MobiDBLite-1.0,PANTHER-11.1,Pfam-30.0,PIRSF-3.01,PRINTS-42.0,ProDom-2006.1,ProSitePatterns-20.119,ProSiteProfiles-20.119,SFLD-2,SignalP_EUK-4.1,SignalP_GRAM_NEGATIVE-4.1,SignalP_GRAM_POSITIVE-4.1,SMART-7.1,SUPERFAMILY-1.75,TIGRFAM-15.0,TMHMM-2.0c]
This should be fixed in v1.0.0 --> was a bug in the phase (codon_start) in some partial gene models.
To replace it with X seems not to be good idea https://github.com/Gaius-Augustus/BRAKER/issues/56#issuecomment-510767073
Dear Jon
I would like to report a bug
I tried to run Interproscan, as it was suggested with your fuannotate update:
I installed docker, downloaded DBs, ran the command and got the following:
Maybe I should manually exclude "*" symbols from Linum_usitatissimum.proteins.fa and submit them to interproscan?