takaram / kofam_scan

CLI tool to annotate genes with KOfam
https://www.genome.jp/tools/kofamkoala/
MIT License
66 stars 11 forks source link

hmmsearch was not run successfully #19

Open hahafengxiang opened 3 years ago

hahafengxiang commented 3 years ago

Hi There,

I am running a protein annotation with kofamscan. An error message keep showing up that "hmmsearch was not run successfully". After successfully running with another .fasta file, I realized that something goes wrong with my original file. Although I finally found that ONE sequence caused that problem and ran successfully by removing it, I still don't know why.

Could anyone help explain it?

Thanks~

The sequence attached below:

VIRSorter_k141_1676536_flag=0_multi=16_9953_len=13692-cat_2_4 # 6084 # 13691 # -1 # ID=10186_4;partial=01;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.465 NGEAGTSGTSGISGINGTNGLNGTGGSSGTSGLSGVDGTSGTAGTSGTSGYSGTDGTSGT SGISGADGMPGTSGTSGISGVDGTSGTSGINGTSGSSGTTGTSGSSGTSGISGVDGTSGT SGLSGVDGTSGSSGTSGSSGTSGISGVDGTSGTAGTSGSSGTSGTSGISGIDGTSGSSGT NGTSGSSGTSGISGVDGTSGTAGTSGTSGIDGTSGTSGISGVDGTSGTSGTSGISGVDGV DGTNGTSGTSGISGVDGTSGTAGSSGTSGTTGTSGSSGTSGISGVDGTSGSSGTSGTSGI DGTSGTSGISGVDGTSGTSGTSGSSGTSGTSGISGVDGTSGTNGSSGTSGSSGTAGTSGT SGISGVDGTSGTSGTGTSGTSGTSGTVGTSGSSGSSGTSGISGANGEAGTSGTSGISGLN GTNGLNGTGGSSGTSGISGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGTS GISGVDGTSGTSGTTGTSGTSGTTGTSGSSGTSGISGVDGTSGSSGTSGTSGISGVDGTS GTSGSSGTSGTSGTSGTSGTSGISGVDGTSGSSGTSGSSGTSGSSGTSGISGINGTNGSS GTSGISGVDGTSGTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGLSGVD GTSGTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGISGVDGTSGTSGSS GSSGTSGISGVDGTSGTSGISGIDGTSGTAGTSGTSGVDGTSGTSGISGINGTNGSSGTS GVSGVDGTSGTSGLDGTHGTSGTTGTSGSSGTSGISGANGEAGTSGTSGISGINGTNGIA GTGGSSGTSGISGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGSSGTSGLS GVDGTSGTAGTSGSSGTSGTTGTSGSSGTSGISGVDGTSGTAGTSGTSGISGVDGTSGSS GTSGSSGTSGSSGTSGTSGISGVDGTSGSSGTSGSSGTSGSSGTSGISGINGTNGSSGTS GISGVDGTSGTSGIDGTSGTSGINGTSGTSGISGVDGTSGTNGSSGSSGTSGLSGVDGTS GTSGIDGTSGTSGIDGTSGTSGISGINGTSGTNGSSGSSGTSGLSGVDGTSGTAGTSGSS GTSGISGVDGTSGTSGISGVDGTSGTAGTSGTSGVNGTSGTSGISGINGTNGSSGTSGIS GVDGTSGTSGLDGTHGTSGSSGTSGTSGSSGTSGISGANGEAGTSGTSGISGVAGTNGIA GTGGSSGTSGLSGVDGTSGTAGTSGTSGYSGTDGTSGTSGISGADGMPGTSGTSGTNGSS GTSGLSGVDGTSGTSGTNGTSGSSGTNGSSGTSGTSGTSGISGVDGTSGTAGSSGTSGSS GTSGLSGVDGTSGSSGTSGSSGTSGSSGTSGTSGISGVDGTSGTSGSSGTSGSSGTSGIS GVDGTSGTSGSSGTSGIDGTSGTTGTSGISGISGTSGTNGTSGSSGTSGISGVDGTSGSS GTSGDAGTSGTSGITGTSGISGISGTSGTNGSSGSSGTSGLSGVDGTSGTSGSSGTSGTT GTSGTSGISGVDGTSGTSGSAGTSGTSGVDGTSGVSGVSGINGTNGSSGTSGISGVDGTS GTSGTVGTSGTSGTNGTSGSSGTSGISGANGEAGTSGTSGISGINGTAGRQGTGGSSGTS GVSGVDGTSGTAGTSGTSGISGTTGTSGTSGISGADGMPGTSGTSGINGTSGSSGTSGSS GTSGSSGTSGISGINGTNGTSGISGVDGTSGSSGTSGTSGSSGTSGSSGTSGISGINGTN GSSGTSGISGVDGTSGTSGSSGTSGSSGTSGSSGTSGSSGTSGISGVDGTSGSSGTSGIS GVDGTSGTSGTSGSSGTSGSSGTSGSSGTSGTSGISGVDGTNGTSGTSGTSGSSGTSGSS GTSGSSGSSGTSGISGVDGTSGSSGTSGSSGTSGISGVDGTSGTSGTSGSSGTSGSSGTS GSSGTSGISGVNGTSGSSGTSGISGVDGTSGTAGTSGSSGTSGSAGTSGSSGTSGISGIN GTSGTNGSSGSSGTSGVDGTSGTSGSNGTSGSSGTSGISGANGAPGTSGTSGLSGVDGTS GTAGTSGSSGTSGSSGTSGISGVDGTSGTAGSSGTSGSSGTSGSSGTSGSSGTSGISGIN GTSGSSGTSGSSGTSGTSGTSGSSGTSGTSGISGVDGTSGSSGTNGTSGTSGTKGTSGTS GSSGTSGSSGSSGTSGISGINGTSGSSGTSGISGVDGTSGTAGSSGTSGTSGTSGIDGTN GTSGSSGTSGISGINGTNGSSGTSGISGVDGIDGTSGSSGTNGTSGSSGTSGISGANGAP GTSGTSGLSGVSGISGTNGTSGTSGTSGTTGTSGISGLNGTTGTSGTSGTGFSAILNATN NRLITSDGTQTNAVAEANLTFDGEILNLAGVFKSKTGEGSSITANTLLYAADTALGNGWI IDYVVKATTGVAMRTGTILAVTDGIDVTFTETSSPDLGASTAAVTFGLTINSTDLEIAAN ISFGTWDVKVAVRVI*

Caelyn-gao commented 3 years ago

I also met this issue, and I didn't find the problem sequence like you did. Do you know how to fix it? Thanks a lot.

Caelyn-gao commented 3 years ago

How do you find the problem sequence?

hahafengxiang commented 3 years ago

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

Caelyn-gao commented 3 years ago

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

Can you show one of your normal sequence? Thanks a lot.

hahafengxiang commented 3 years ago

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

Can you show one of your normal sequence? Thanks a lot.

some are like this :

VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_5 # 1020 # 2204 # -1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=TATAA;rbs_spacer=11bp;gc_cont=0.301 MLQELLDLKEQGLANDKAFTFEKAKGETRFVQKGHENESLAKKTNLLSRDFYRGGGICPK GKSLDEIAEYLKIPHVFRSEGIGYDAHFNTHKRAIYLEDGILLDIVSEKWVLIQPIETIG LFLDFCTENNLEIERIGTFISNKDKTLEGGNTDILQRYKIYITAKLDDSFEVSKGDRVSG KLLFTFGYLNGLGFNASLLTLREICSNGLRIPVKIGGQVVSHIGELVKKKTQILKLLQDS KQVWKKEKEDYLLFQNTEMTYLEAMMFLINNFSKIPLHKELAQKALVDWKEGKGIETLDS IFQSNQWFDEKEIVREVINMYRENQFTGSEFCSNTVWGLLNSVTEYINWKGKQIKNPLAS LIDVNGHRGKLMYKVRTKLHDDFVKIKQNVTVSI VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_6 # 2467 # 3360 # -1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=4bp;gc_cont=0.346 MLIQFDFNHSFIRMEKRADGNIWVCITDMAKSSGKLVADWKRLKTTQDFLTAFESSMGFP ITETIQGGQPEKQGTWAIQEVAIEFAGWCSLDFKMWMLRQIKKLMNEGQVSLKENDTLDS QKVLLNAMDLMAQMSTTLENREKLLQQTIRSLSILEEERLDREYYLGQINEITKEHPLFS SLLQFALTLKNEQYTFPSIGYTVCQILQMFPIRHCSEKRFANLCSDLYWLNKNKKPNEVG VYKYVGDELIYPTVILFKQENYSWDEIKEKIEIDYKFRLPASNRRAFTEIMAKKKRK VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_7 # 3399 # 3734 # -1 # ID=1_7;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=5bp;gc_cont=0.214 MIDFKILNNKNLEITLNVPSFIFKRFLNQHKDLTFNQIWKKLIENTELVFIEPFEVGALT DAPIIRFNHRYYWFSDYMVRDELKELEKNNKVIFELAYDENCRIKEFSTID VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_8 # 3727 # 3969 # -1 # ID=1_8;partial=00;start_type=ATG;rbs_motif=AAAAAA;rbs_spacer=6bp;gc_cont=0.177 MMNKNNLQKIRHQIFILYCNLISFIDDWDFLQLDLILQKIDKEYTVDIYLSPEDIKSQTY KIISINNQINIDTILDNLYD

Caelyn-gao commented 3 years ago

How do you find the problem sequence?

Actually it's not an efficient way. I separate the original file into small parts with Seqkit software, and ran kofamscan one by one. Nextly, I opened the problemed file and look through it. Maybe I am lucky enough that, I found that sequence above immediately I saw it, as it looks so unusual and different.

Can you show one of your normal sequence? Thanks a lot.

some are like this :

VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_5 # 1020 # 2204 # -1 # ID=1_5;partial=00;start_type=ATG;rbs_motif=TATAA;rbs_spacer=11bp;gc_cont=0.301 MLQELLDLKEQGLANDKAFTFEKAKGETRFVQKGHENESLAKKTNLLSRDFYRGGGICPK GKSLDEIAEYLKIPHVFRSEGIGYDAHFNTHKRAIYLEDGILLDIVSEKWVLIQPIETIG LFLDFCTENNLEIERIGTFISNKDKTLEGGNTDILQRYKIYITAKLDDSFEVSKGDRVSG KLLFTFGYLNGLGFNASLLTLREICSNGLRIPVKIGGQVVSHIGELVKKKTQILKLLQDS KQVWKKEKEDYLLFQNTEMTYLEAMMFLINNFSKIPLHKELAQKALVDWKEGKGIETLDS IFQSNQWFDEKEIVREVINMYRENQFTGSEFCSNTVWGLLNSVTEYINWKGKQIKNPLAS LIDVNGHRGKLMYKVRTKLHDDFVKIKQNVTVSI VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_6 # 2467 # 3360 # -1 # ID=1_6;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=4bp;gc_cont=0.346 MLIQFDFNHSFIRMEKRADGNIWVCITDMAKSSGKLVADWKRLKTTQDFLTAFESSMGFP ITETIQGGQPEKQGTWAIQEVAIEFAGWCSLDFKMWMLRQIKKLMNEGQVSLKENDTLDS QKVLLNAMDLMAQMSTTLENREKLLQQTIRSLSILEEERLDREYYLGQINEITKEHPLFS SLLQFALTLKNEQYTFPSIGYTVCQILQMFPIRHCSEKRFANLCSDLYWLNKNKKPNEVG VYKYVGDELIYPTVILFKQENYSWDEIKEKIEIDYKFRLPASNRRAFTEIMAKKKRK VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_7 # 3399 # 3734 # -1 # ID=1_7;partial=00;start_type=ATG;rbs_motif=TAA;rbs_spacer=5bp;gc_cont=0.214 MIDFKILNNKNLEITLNVPSFIFKRFLNQHKDLTFNQIWKKLIENTELVFIEPFEVGALT DAPIIRFNHRYYWFSDYMVRDELKELEKNNKVIFELAYDENCRIKEFSTID VIRSorter_k141_593649_flag=1_multi=6_8551_len=21335-cat_2_8 # 3727 # 3969 # -1 # ID=1_8;partial=00;start_type=ATG;rbs_motif=AAAAAA;rbs_spacer=6bp;gc_cont=0.177 MMNKNNLQKIRHQIFILYCNLISFIDDWDFLQLDLILQKIDKEYTVDIYLSPEDIKSQTY KIISINNQINIDTILDNLYD

Thanks a lot. I will try to find the problem sequences to see if can work normally.

Caelyn-gao commented 3 years ago

Hi, I found the problem sequences using the method of you. It seems the problem sequence is too long that the kofamscan cannot handle on it. But I don't know why. The problem sequence below is Protein TolB according COs. So it is not a sequence that cannot be annotated.

MMKRSWIAAWLCLVLLLLQIIFAWPLFASEQKIVYPRQEASGKYDLWMMNPDGSGQQRLT DDAKNGTDSTSPSIFPDGKRILYQRGYNIAILNVDTRQITDLTTDGAYGIYAYSEPWLSP DGMKITYMYGQPIAGSCSSCRTYDVWIMNADGSNRVQMTSNTYRDATPIYSPDGTKLLVT HYQGAPSSDCCNATDVYTMDIATKVETKLYGSSYYDWGFAWNNSGILFTTQNGALVRINP DGTGFTTVLDASNRVGSATYSVTGDKILYQSNVSGTNNLYVVNPDGTNSVAITTGMNVSG TGVWGYINSSAIPKIYYVSNQSGTYDIWKMDPDGNNKVRVTTLPGSEAYPRVSPDGRKIA FNSDATGSYDVYVMNVDGTDIRQLTSGKNTNGQLTWDPASTKIYYAAPAASIYDAAVRSV NVDGTGDSQVFDHVGYHDVEVDVSPDGNSLVYIYEQCCWTPNRSIRLRNLSSGTDIELLA ADGYSDFYPRFSPDGNSIIWTRNRNTNPYAFGYDIWRMNKDGSSKTNLTGAYSTLAFFNA NYSRDGNKIVMSAQTNGGDSNVYTMNSDGSGLLQLTTGSSADDSPDFATIIPPPNKIVYH SNQSGNDDIWVMDEQGGNKVQLTNATANEFQPKWSKDGLKIVYTSDASGNYDVWVMNSDG SGKTQLTTNTALDYQPIWTPDGAKILFFSSRDGGRDVYIMDASGANQTRLTAVNGWTGRD GIGISPEGIKIAYTTQPSGANWGYYNELYSATLTCSGNASTCSLSNTLKIASFDNQIDVT PSFTPDGKILWSSGRHNPYGTCSTNCYFSTQEIYRINPDGSGEVQITSNTITSDVQPVSS PDGSKIIFSSDRASGSANVSNTSDLWVVNNDGTGLTQLTNTSAYSEGGADWWASGGLVIY TLTVTKSGTGAGTVNSNPAGISCGVTCSESYSAGTSVVLTATPDSGSTFIGWLGDCSGTG TCTVPMGAAKNVTAAFGDTTPPDTTITAKPTNPTNSTSAAFSFTSTEAGATFQCQLDAGG YSSCTSPKGYSGLSAGSHTFYVKATDAAGNTDATPASSAWTIDLTPPVTILPIEGRPDPI TNSTSATFTFSSESGVMFQCQFDGGIWTTCTSPASFAGLAVGNHTLLIKATDTAGNVETP VSYSWTIDTTAPNAPSVAGTTPTNTRTPSWGWASGGSGGNGTYRYKLDSSDLTTGATETT ATSYTPTGNLTEGPHILYVQERDVAGNWSASGSLVIVVDITAPTDVISYSLTSNGAWTSP VIVDEANDVGLYTSIAVDSNNKPHIAYFDLTNRDLKYTTNTSGSWAFQMIDGDQGENPSI AIDKSNKVHIAYHDNTNLALKYLTNGTGDWVKETIDAANCEWISLALDSNNKIHISAENN LGALPRPLRYVTNASGSWVPATIDNTVERVGFWTSLAIDRQDKIHISYYDYTNANLKYIT NAGGSWVRTTIDETGTVGLYTSLALDSNGKAHISYYDQTNGNLKYANNVSGSWTIETADN SSDNVGEYTAIGIDTIGKAHISYYDRTNGNLMYATNVSGSWVRKKLDGDNADAGLWTSLK VDSQNGLHISYYDQTNGNLKYIKSTGIIQLPKTGQTTCYDTSGAVVSCSGTGQDGEIQAG TVWPNPRFTDNGDQTVKDNLTGIIWTKDGNAPGPAACGAGVAKTWQEALDYVKCLNINNY LGHNDWLMPNINEIESLVNYNEPNSAAWLNSQGFFNAQPDFYWSSTSSALFPNAAWNIHL WNGMHYEDKTIVRRYVWPLRKGGSGNFVNLPQTGQTSCYHASGAVITCTGTGQDGEIQAG SAWPNPRFTVNGDQTVKDNLTGMIWTKDGNAPGPAACGAGIAKTWQEALDYVKCLNINNY LSRNDWHMPNMNELESLVNYNEANLVSWLNSEGFYNVQPAAYWSSDSAANDTSIAWIVKM GDSSGSITYKTSPCYVWPVRSGQSAIFGSGISINSMAISTNSVSVNLSISAIDANSVSKM ILSNDGTFDAEPEEDYVTSKTWTMSTGDGEKTVYVKFRDAAGNWSQVYRDSIILDTVIPV TTIATKPASITNSTSATFSFTSDAGVTFQCQLDGGTWSTCTSPASYTVLPPGDHTLLIKA TDAAGNTETPVSYSWTIDTTAPNAPVVSGTTPTNDQTPTWTWVSGGSGGNGTYRYKLDSS DLTTGATETTATAFTPSTNLTEGSHTLYVQERDAAGNWSNSGSFAIVIDTTAPTAGTGGV IQLPKTGQTKCYNSSGTEITCLGTGQDGEIQAGIAWPNPRFTSNADTSLTDNLTGLIWAK DGSTPTFNSCTGGTVTWTAALAYVTCLNTNNYLEHNDWRLPNRIELESLGNVGWANHSTW LNGQGFTNVQSDYYWSSSTGAYNTDGAWIIEIGGVYYVDRADDKSLNHYVWPVRNGSSGA VQLPKTGQITSYAAGDDGDLQKGTAWPSTRFTVSGDCVIDRLTGLMWTKDANLAGGKSTW QELLTYANNLNICGYTDWRLPNRKELSSLLDFSKSVPAVPDNHPFSNVKIEDSNNEAYWS SSTFAYLPHGAFVVTMGNGAIGNSWKSYGGVYAQYGWPVRGGLDSGTATGTGVSINAGAA WATGTSVTLALSAKDSNGVTHMMVSNDAAFTGATEQAYTTTKTWLLSSGDGDKTVYVKFR DAAGNWSQIYSDSIVLDTVMPVTTIATKPAGITNGTSATFTFTSEVGATFQCQLDGGVWA ACTSPFSYTGLLPGDHTLLIKATDAAGNVETPVSHSWTIDIAAPNAPLVTGTTPTNDQTP TWTWASGGNGGIGTYRYKLDNSDLTIGATETTLLSYTPTGNLTEGSHTLYVQERDAAGNW SSSGSSSIVADFTPPVGAITSVTGTFTATGPMTVNRYSHTATLLPNGKVLIAGGYPANDS PRFNTAELYDPATGTFSATGNMISKRAQHTATLLANGKVLLAGGSVYDTSWSALNTAEIY DPATGEFTPTGMMRDIRYCHTATRLHDGTVLIAGGWTASAALNRAEIYDPVSGTFSETGN MGYARYVFTATLLQNGKVLIVGGTNGMTGNTGYKAEIYDPVARSFSATGDLNDSRCMHSA VILPNGKVLVADGWYSDLQNKRLEIYDPNTGIFTASATTTSGLLASLLSNGQVLLAGGTG TAFIFDYQTGICTPLNASMVTSRSYYNYSETLLPNGLVLLAGGSDHSPNALNSAELYLPV SQAISINSGATATNATSSSLALWATDATGVNQMILSNDAAFSGAIAETYATAITWNLNSG DGTKTVYVKFKDAAGNWSQAYSDTIILDSTAPTVTPSVAGGTYTETQTVTLTCNDGSGTG CGSIYYTISGNEPTTGSTVYSSPIIISATTTLKYFAVDSAGNSSPIQTATYTIQGFNILN TGTLSNGLVAWYPFNSNANDASGNGNNGTVNGATLTTDRFGNPNGAYSFNGSDNGVDVND SGTLQLYDTAAFAAWFQLKQWPTSIPNNQASLICKGATADLYAEYCFMITSEKKLSLYAS NNTYEMANVSYDISGISLNEWHHYSATFERGTIKIFVDGILKQTGTIPITALRASTNDLY LGKWRDGWHYANGSLDDVRIYNRALSASEIQALYGSAGTINAVQGDTRLKTLNLTSFGGY SQSADLTYAWVGTAPTGATVNITPSNIMPTPSGASASVSFTAGVNTPAGSYTLRITATSG TITKTADILVNVSAPLAIPTLTLDPATKGTSYTPSVLASGGVGAYTFSVASGTLPTGLTL NNNGTFSGSPTTRGTYTFTIQAMDNDGHASSREYTVRVYDPAYRKLVLESTSWSVYKNEA TGWIYTRVLDDYDVSVTMTTPTAIDITSSSSTGKFSTDGLTWYSIFSPDISSGSSSKRFL YKDSTNGSFTITAAGVPGSSNEQWAAGSHVLTITEPPVVDTTPPDTAITGNPQPVTNSTS ATFTFSSSEEPATFECNLDGAGWVNCTSPENYTGLAVAGHTFQVRAKDAAVPVNNVDPSP ASYAWKIDQTGPTGAGFGVLPRTGQTTCYDTAGAVMPCAGTGQDGEIQAGVAWPNPRFTN TDGTSPVSDALVLDKLTGLEWPKDAGTPTAGSCTGGAKTWQGALDYVACLNTNNYLGHND WRLPNVNELESLVSANNSGPSIPTGHPFTNVQAARYWSSSNNVDDYYGTSWAFFVAMTNG VLDLYPKSSGYYVWPVRNGALGGTVVWRTGQTACYDSSGAAITCAGTGQDGDKLAGSAWP SPRFTDNGNSTVTDVLMGLTWTKEANAPGPQSCSPGGAKTWQAVLDYVKCLNNNSYLGYN DWRLPNRIELRSLSDYSIHAPSLPPGHPFTNVQVSSVYWSSTTYIDDASRAWYVYMDIGL LSHAFKSNSYCVWPVRGGQSAGAGIGIIINNGAPATNSTSVTLGFSATDANGVSAMMVST DANFTGAFEEPYATSKAGTLSPGEGEKTFYAKFKDTAGNWSGIYSDTITLDVTAPTLVVG SPSASVTSNGDVTYTVTYGGADFITLSPGNITLNNTGSANGTVSVTGTGSTTRTITVSAI TGNGTLGISLAAGTGRDAAGNQTQAAGPSGTFTVDNSAPTATIDTKPPSLTNAASESFTF SSEQGATFQCKLDSGNYAACSGTASYSSLAEGSHIFWLKTTDTAGNITETSYSWTVDTIG PVAGVAFNSNLVAYYPFNGNANDESGNGNNGTVNGATLTTDRFGHTNSAYSFNGSSSSIA IDSVAPSISGQSAGAIALWFKASSSIPQGYGVPLIVFYKDMEVGPDQQGFFVVGRFSSGL PYNQSIGYYSPRPNFEGAYINGVNAYKDDQWHHAVITIDNSGNRLYVDGQSVPLTYTSFA GYNQNAMWSTVTSVVIGKRAIDSWVFNGAMDDVRIYNRALSATEIQAIYNGQSGPGISIN YGAAATNTTSVTLSFSAFDANGVSEMMLSNDGSFNGAVAEAYLATKTWTLTSGDGEKTAY VKFKDNAGNWSQVYSDSIILDTLVPVTTISDKPASVTNGTAATFTFASDAGATFQCQLDG GAWLACTSPANYSELLPGDHTLLIKATDTAGNVEIPVSYTWTIDIAAPNAPSVAGTTPTN NQKPTWTWASGGNGGIGTYRYKFDSSDLTTGATETTDTSYTPSSNLPEGSHTLYFQERDA AGNWSVSGSFAITVDTTGPVAVGESVVFVSDRSGNPEIWKTSLAHPSNMVKISNFAGSMI PSQLSWSPDGQWIAFWAFPPGESNNDIYVIKSDGSQPADRLYGRRYDAGDLARFGADNDW VYFRDGYAALNGMIYRVNRISKVIETVQGDTSKTVQSFDISEDGRYRLETRENGCCWSPN QYAVLYDLVDGTSTTIMPQDGNSESNPNFSHDGSKIIFTNATGYQTPQNLWVVNRDGSGK RPVTTETGNIFYHSASWLSDNQNVLVTYNNGTRDGLYIVDTNSGQKQPFLADANYNYANS DYRVSLGADNGISINTGASATNSASVALSLSAFDANGVSGMMVSNDDTFTGATEEGYATG KAWSLPAGDGAKTVCVKFKDNAGNWSPVYCDSIVLDTVTPVTVISGKPAIITNGTAAIFT FAAEAGVTFQCQLDGGAWLACTSPASYTGLPAGDHTLLIKATDVAGNVEIPVSYTWTIDT VAPNAPSVAGTTLTNDQTPTWTWASGGNSGNGTFRYKLDNSDLTTGATETTAVSYTPLSN LPEISHTLYVQERDAAGNWSASGSFAITVDTTVPVAESAVTPAGMVLVPAGSFQMGDSFS EGESDERPVHTVTVSSYYLEKYEVTKALWDEVKTWATANGYEFDNAGTGTATTYPVQGVS WYDVIKWLNARSEKEGRMPVYYTGAGQTTIYRTGQVNVVSGAVQWSANGYRLPTEAEWEY AARAGTTTRFYTGDCISADTQANYNGNSSWSGCTGGQSRGETTAVGSFTANPWGLHDMAG NVWEWTWDWIGSYSSTAVTNPRGPDSGSYRVFRGGSLSDGAYYLRLAYRTGSFPAGRGIN LGFRSAHAVNTGIAKGVTINSDAFATNSTSVTLSILAFDANGVTHMKVSNDASFTGTNEE TYITSKAWPLTSGDGIKTVYVMFKDAAGNWSQAYSDSIVLDTTGPAVVANPQGGTYLTAQ SVTLMCSDASGSGCEKIYFTTDGSEPTIGSAVYSGSISITSTTTIKFFAVDQLGTAGPVQ TATYTIQGVTMTAGSIKVVKGETRQTTVTLTSIGGFNAAMNLSHAWQGSALDDAVVNITP ATVTPTPSGASANVSVTVDSATAAGNYTLQIIAEGGQYATFVNVPVTVANPLAFTTNSPI NGVKGQPLAPITATGGIGSLICSLVITEPPGISPPGVTFNADGTFSGLPTARGTYVFTVR CSDTDGHSVDRQYTIRVYDPAYRHLVLESASWSLQEDGGVSDWIKAKVLDDYDVSAAVTL NSTIYITSSSATGQFSLSGSFSGSEVNALLVDIPAGSSLKSFKYRDSTAGSFAIAVIGWE GTPSAEWGPASHQITVVEVLPTQYNLSIAKTGTGSGNVTVSTGSVSWEGNNGMATYNAGQ QVTLTAAADPTSSTFTGWSGGGCSGPAPTCTVTMDAAKNVAAAFALKTYNITATAGANGS ITPAGSLFYNHGSSQAFTITPDTGYHIADVLVDGYSIGAVGSRTFDPITAGHTISATFAV NVYALTVTKTGTGTGAVTASPGTIVWTGNTGTASYNYNTPVQLTASANTGFTFTGWSGGG CSGPVPTCTVTMDAAKNVTANFADITPPTGLVAINSGAAHTTQPAVTLSILATDASGVSK MVVSNDANFAGGASDENYGTSKAWTLTDGDGTKTVYVKFKDAVGNWSQPYSDTIVLDTVA PTVAAQPATGTYMSPQSVTLICDDGSGSGCANIYFTTDKTEPTINSTRYMDPIPVSATTT LKFFARDSAGNSGTVRLETYTFPDVTMTGGNIKVVQGETRQSTVTLKALDGFNSSMTLSH QYQGTEPANASVSITPDTVTPTTSGAAAVVAFTAGTTTATTAVDRPYIIRVTATGGEITR TADIQVMVAAPLAIGTPTLPDGVKGQPYTAAVVATGGIAPYTFTKMSGQLPGGLTLSTGG AISGTPTARGTFTFAVQAADSDEPRHSVTQEYTVRIYDLAYRTLVMEAGSWMVEKSSATQ INASDLIMVMIKDDYGNYVNADANTMIRITSSSPTGRFSSDGSSFPSSSLAPTIYQGNAT TLFFYKDTTAGTFTLTASGIAGQPSASWQAGSHEITVWFDAHETELTASATQSLVYGQGM TVTGLLKDARDNVPLPSKTVTLAFTSPSGSILNRTAQTASDGRFTYSADAAMIDAAGPWT GRASFSEPASYKDSAATDSFDVAKANTRLEITTSASSVPPGGQVTVSGQLSAFTSFAVSL SGIDIVVEFIGPDGSTVHATTVKTSDASGHFTYPYTTPSVPLGLWSIRARFVGTGNLNSS DSDAKSLNVTNSPGYAILVQGDLGGTYRDSYSASLDDIYKKLRNRSFPTGNIWYLSHPAA THDTGIAPNAHTSKENVRKAITEWALARITEGGIAPLYIVLMDHGSSGLFHIDPETITPE ELNGWISTIETGIKTNLQKDLTTVVINGSCYSGSFIPALSKEGRVIITSGAEDEETLQGP DTETPNKVFGEYFVYYLFSYLAQGENLRDAFKDAAQETHRERKCEGKDCKTNSALGNQGN TRQHPLLDDNGDKKGSWMGVVGQEDGGVTSHLVLGLGANPATIKITEVMPTTKVMMGTSS VLAFAKTSNYAQTMASWVKVRKPFFAEPNSSGTGQVVMNLTRIEGTPNPTTGRWEYNITP LNEAGAYTLYYYAMDSNGNILPPVAGTLYVDTPDNNPPAAFNLTSPADSAELNDAMMIFK WAKSVDPDNDLVTYTLKIYDEETVSEIKRYELISQEFFFLNASQEKKPDGVTPLFTTGKH YLWKVEAVDGKGKSVEVQARRFHVIFTNALTGIITGVVLSDRDYSQIASASITATIGGTV VNIPVTDGAFVLNVNPGSLDLSSTSSGYQSASLSDVNIVAGQATMVNILLSPNGLQGDID GSGVVDIADAILALRITAGIGPTAGVTIHRENGVNPGGAIGIHDTLYILQYLAGLRP

hahafengxiang commented 3 years ago

Hi, I found the problem sequences using the method of you. It seems the problem sequence is too long that the kofamscan cannot handle on it. But I don't know why. The problem sequence below is Protein TolB according COs. So it is not a sequence that cannot be annotated.

MMKRSWIAAWLCLVLLLLQIIFAWPLFASEQKIVYPRQEASGKYDLWMMNPDGSGQQRLT

Glad to know.

Hope an official solution will come up~

hegartybr commented 3 years ago

I'm also having this same error (on 3 of ~70 files). It also doesn't seem to be a size issue for me because my largest gene calls run just fine on their own (and aren't as large as Caelyn996's), but going through each, I realized that it may be similar to hahafengxiang's problem of a weird motif (see below) though not sure why this would cause it to fail... Any other input would be greatly appreciated. Thanks! (also, happy to provide any extra information/files on my runs that would be useful)

weird_gene_call_that_failed MAIAFEAASSTMVPSTATNPSVTITITDATDMVVAYGGSINGGASAMTFNSTGVMTQVVLENSSASTSGNLVGIGVYYILNNDLPAAGTYSVAATFAVADSQAMVGAVALSGADQSTGEHVISSTSLHNTSSTSITVSAQAATANSWWIRSVYAATTSSATSPGILIQSTDVARVSMVDTGLLSESMMTANPVNSTIASGPGFFASGSSSLNLSMVVFGVPAASASPSASPSASPSQSPSASLSPSASPSDSPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSASPSASLSPSASPSASPSASPSASPSLSPSASPSASPSASPSESPSASLSPSASPSASPSASPSASPSASLSPSASPSASPSQSPSASPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSASPSASPSASPSASPSASLSPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSQSPSASLSPSASPSTSPSASPSASPSQSPSTSLSPSVSPSASPSASPSTSPSQSPSASLSPSASPSASPSASPSASPSQSPSASLSPSASPSASPSASPSASPSASPSVSPSASPSQSPSASLSPSASPSASLSPSASPSASPSVSPSASPSASPSASPSASPSQSPSASLSPSASPSISPSASPSASPSQSPSASLSPSASPSASPSVSPSASPSQSPSASLSPSASPSASPSASVSPSASLSASPSESPSASVSPSASPSVVTSASPSESPSASLSPSVSPSVGGSSASPSESPSVSLSPSASPSASVSPSESPSPSAVTSGPSPSVSPSVAVFISASPSEAQVTFVGPPGQETLVDEEGGEPVIFKNQMYFTTTLTEDNSPILIGRRTIPTWNTTGRPKKAKVGTLGFNLKTNNLEYWDGSRWLILRMKKI*

AstrobioMike commented 3 years ago

Hey folks, I hit this too.

(First, thank you to the developers and maintainers of kofamscan :))

In wondering why this came up only recently with KoFamScan (with no recent changes), I wondered if it was related to a change in HMMER3 possibly (which did update in September as far as what conda would install, and more recently HMMER as a whole if installing from github.

In using the example problem sequences you all kindly provided above (keeping me from needing to hunt mine down :)), I was able to track it down to specifically the HMM for K14297. This fails with a seg fault on HMMER version 3.3.1, which is the latest conda install of HMMER, but it doesn't fail when run on HMMER version 3.3.0. Here is a reproducible example using conda environments if interested:

### getting problem sequence example ###
curl -L -o trouble-seqs.faa https://ndownloader.figshare.com/files/25861299

### getting problematic hmm profile ###
curl -L -o K14297.hmm https://ndownloader.figshare.com/files/25861284

### trying with hmmer 3.3.1 ###
conda create -y -n hmmer-3.3.1 -c conda-forge -c bioconda -c defaults hmmer=3.3.1
conda activate hmmer-3.3.1

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # Segmentation fault (core dumped)

conda deactivate

### trying with hmmer 3.3.0 ###
conda create -y -n hmmer-3.3.0 -c conda-forge -c bioconda -c defaults hmmer=3.3.0
conda activate 

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # no problem

After putting this example together to provide as an issue to HMMER's github, I realized it seems the great folks at HMMER already caught this, as it is noted as a bug fix in v3.3.2's release notes.

It's just not updated on conda yet, so if wanting to use conda, installing hmmer=3.3.0 as shown above in the kofamscan environment should get around this problem, or of course installing the latest from their github 👍

halexand commented 3 years ago

I was just coming to see if anyone else had run into this problem! I just updated the conda build of my env and it seems to have cleared up the issue for me!

name: kofamscan
channels:
    - bioconda
dependencies:    
    - kofamscan
    - hmmer=3.3.0

Thanks for working it out, all!

hegartybr commented 3 years ago

Thank you for sharing your solutions, halexand and AstrobioMike! Unfortunately, when I try to build a conda environment specifying the older version of hmmer (like halexand showed), I get an error from kofamscan because it is requiring version 3.1 or greater:

Package hmmer conflicts for:
hmmer=3.3.0
kofamscan -> hmmer[version='>=3.1']

This doesn't make sense to me, since version 3.3.0 should satisfy kofamscan's requirement...

I'm trying to build the environment using this command: conda env create --file kofamscan.yaml, which works fine typically for me.

Any help with this would be greatly appreciated. Thanks!

AstrobioMike commented 3 years ago

That's strange, @hegartybr. Not only does it work for me, but 3.3.0 is >= 3.1, ha (though I have seen some odd things in how conda compares versions sometimes if not all digits are placed there - as is the case there).

First thing i'd do is see if you have the latest conda (conda --version, currently 4.9.2, you can update with conda update conda if you don't, but probably best to do this in the base environment and not in one that is set up for a specific project or anything).

Then maybe try with this in a file that more explicitly sets the channels and versions (can't attach a yaml it seems):

name: kofamscan
channels:
    - conda-forge
    - bioconda
    - defaults
dependencies:
    - kofamscan=1.3.0
    - hmmer=3.3.0
hegartybr commented 3 years ago

setting all the channels explicitly seems to have done the trick (it loads fine if I don't specify the exact version of kofamscan and my conda version is the latest). At the very least, it is running now, so hopefully I'm good. thanks, @AstrobioMike, for the tip!

Caelyn-gao commented 2 years ago

Hey folks, I hit this too.

(First, thank you to the developers and maintainers of kofamscan :))

In wondering why this came up only recently with KoFamScan (with no recent changes), I wondered if it was related to a change in HMMER3 possibly (which did update in September as far as what conda would install, and more recently HMMER as a whole if installing from github.

In using the example problem sequences you all kindly provided above (keeping me from needing to hunt mine down :)), I was able to track it down to specifically the HMM for K14297. This fails with a seg fault on HMMER version 3.3.1, which is the latest conda install of HMMER, but it doesn't fail when run on HMMER version 3.3.0. Here is a reproducible example using conda environments if interested:

### getting problem sequence example ###
curl -L -o trouble-seqs.faa https://ndownloader.figshare.com/files/25861299

### getting problematic hmm profile ###
curl -L -o K14297.hmm https://ndownloader.figshare.com/files/25861284

### trying with hmmer 3.3.1 ###
conda create -y -n hmmer-3.3.1 -c conda-forge -c bioconda -c defaults hmmer=3.3.1
conda activate hmmer-3.3.1

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # Segmentation fault (core dumped)

conda deactivate

### trying with hmmer 3.3.0 ###
conda create -y -n hmmer-3.3.0 -c conda-forge -c bioconda -c defaults hmmer=3.3.0
conda activate 

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # no problem

After putting this example together to provide as an issue to HMMER's github, I realized it seems the great folks at HMMER already caught this, as it is noted as a bug fix in v3.3.2's release notes.

It's just not updated on conda yet, so if wanting to use conda, installing hmmer=3.3.0 as shown above in the kofamscan environment should get around this problem, or of course installing the latest from their github 👍

Hey folks, I hit this too.

(First, thank you to the developers and maintainers of kofamscan :))

In wondering why this came up only recently with KoFamScan (with no recent changes), I wondered if it was related to a change in HMMER3 possibly (which did update in September as far as what conda would install, and more recently HMMER as a whole if installing from github.

In using the example problem sequences you all kindly provided above (keeping me from needing to hunt mine down :)), I was able to track it down to specifically the HMM for K14297. This fails with a seg fault on HMMER version 3.3.1, which is the latest conda install of HMMER, but it doesn't fail when run on HMMER version 3.3.0. Here is a reproducible example using conda environments if interested:

### getting problem sequence example ###
curl -L -o trouble-seqs.faa https://ndownloader.figshare.com/files/25861299

### getting problematic hmm profile ###
curl -L -o K14297.hmm https://ndownloader.figshare.com/files/25861284

### trying with hmmer 3.3.1 ###
conda create -y -n hmmer-3.3.1 -c conda-forge -c bioconda -c defaults hmmer=3.3.1
conda activate hmmer-3.3.1

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # Segmentation fault (core dumped)

conda deactivate

### trying with hmmer 3.3.0 ###
conda create -y -n hmmer-3.3.0 -c conda-forge -c bioconda -c defaults hmmer=3.3.0
conda activate 

hmmsearch K14297.hmm trouble-seqs.faa > /dev/null
    # no problem

After putting this example together to provide as an issue to HMMER's github, I realized it seems the great folks at HMMER already caught this, as it is noted as a bug fix in v3.3.2's release notes.

It's just not updated on conda yet, so if wanting to use conda, installing hmmer=3.3.0 as shown above in the kofamscan environment should get around this problem, or of course installing the latest from their github 👍

Thanks so much!! I update the "hmmer" and it works well now!