Open smb20200615 opened 2 years ago
Hmm, I'm not sure what's happening. Would you mind sharing your input file genecall.txt
(or a subset of it, e.g. with head -100 genecall.txt
). You can attach it here in a comment, or email me at mikeyu@ttic.edu.
Thank you so much for the prompt reply!
gene_callers_id contig start stop direction partial call_type source version aa_sequence
0 c_000000000001 271 829 r 0 1 prodigal v2.6.3 MGNTTYLKINSENDVDLQDILNDFINCFCKGYVEIKTKYKLLPIFKINFHKNNLPHLLGLHYTHKKVSAKKIIGRIAEGKITHESIKKHYEYSNIKDRLINYNFLHKCFIDKEIRLCVIVPKNSINPQKIDVAFIDDKNSQVMILGLRKSNNNDFYSPATMYVLGKNSSYRRMRRTHVISIEWKN
1 c_000000000001 1483 2938 r 0 1 prodigal v2.6.3 MLMTKNQAEKWFDNSLGKQFNPDLFFGFQCYDYANMFFMLATGERLQGLYAYNIPFDNKARIEKYGQIIKNYDSFLPQKLDIVVFPSKYGGGAGHVEIVESANLNTFTSFGQNWNGKGWTNGVAQPGWGPETVTRHVHYYDDPMYFIRLNFPDKVSVGNKAKSVIKQATAKKQAVIKPKKIMLVAGHGYNDPGAVGNGTNERDFIRKYITPNIAKYLRHAGHEVALYGGSSQSQDMYQDTAYGVNVGNNKDYGLYWVKSQGYDIVLEIHLDAAGENASGGHVIISSQFNADTIDKSIQDVIKNNLGQIRGVTPRNDLLNVNVSAEININYRLSELGFITNKKDMDWIKKNYDLYSKLIAGAIHGKPIGGLVAGNAKTSAKNQKNPPVPVGYTLDKNNVPYKKEDGNYTVANVKGNNVRDGYSTNSRITGVLPNNATIKYDGAYCINGYRWITYIANSGQRRYIATGEVDKAGNRISSFGKFSTI
2 c_000000000001 2948 3251 r 0 1 prodigal v2.6.3 MDAKVITRYIVLILALVNQFLANKGISPIPVDDETISSIILTVVALYTTYKDNPTSQEGKWANQKLKKYKAENKYRKATGQAPIKEVMTPTNMNDTNDLG
3 c_000000000001 3386 3686 r 0 1 prodigal v2.6.3 MFGFTKRHEQDWRLTRLEENDKTMFEKFDRIEDSLRTQEKIYDKLDRNFEELRRDKEEDEKNKEKNAKNIRDIKMWILGLIGTILSTFVIALLKTIFGI
4 c_000000000001 3731 3896 r 0 1 prodigal v2.6.3 MLKLISPTFEDIKTWYQLKEYSKEDIAWYVDMEVIDKEEYAIITGEKYPENLES
5 c_000000000001 3888 4278 r 0 1 prodigal v2.6.3 MQILVNKRNEIISYAIIGGFEEGIDIENLPENFSQVFRPKAFKYSNGEIVFNEDYSEEKDDLHQQIDSEEQNTVASDDILRKMVASMQKQVVQSTKLSMQVNKQNALMAKQLVTLNKKLEEVKGETENA
6 c_000000000001 4277 5777 r 0 1 prodigal v2.6.3 MDFTRRENYKLMSNLEKSVAINLENTAHYENISNLDITFRTGESDSSVLLFNIIKNNQPLLLSEENIKARIAIRGKGVMIVAPLEILDPFKGILKFQLPNDVIKRDGSYQAQVSVAELGNSDVVVVERTITFNVEKSLFSKVPSETKLHYIVEFQELEKTIMDRAKAMDEAIKNGEDYASLIEKAKEKGLSDIQIAKSSSIDELKQLANSRISDLENKAQAYSRTFDEQKRYMDEKHEAFKQSVNSGGLVTSGSTSNWQKAKITKDDGKIMQITGFDFNNPEQRIGDSTQFIYVSQAINYPRGASTNGTVEYLVVTSDYKRMTYRPNGTNKVFVKRKEVGSWSDWSELALNDYNTPFETVQNAQSKANTAESNAKLYTDDKFNKRYSVIFDGTANGVGSTLYLNESLDQFILLIFYGTFPGGDFTEFGNPFGGGKISLNPSNLPDNDGDGGGVYEFGLTKSSRTSLTISNDVYFDLGSRRGSGANANRGTINKIIGVRK
7 c_000000000001 5743 7654 r 0 1 prodigal v2.6.3 MENLYLIKDLGALAGRDYRAKEIQNLQRIEQFALGLTTEFKLHQKAKTIQHFAEQIYYNGRSQAAVNKSLQSQINALVVAPRNNSANEIVQARVNVNGETFDTLKEHLDDWETKTQINKEETIRELNKTKQEILDIEYRFEPDKQEFLFVTELAPLTNAVMQSFWFDNRTGIVYMTQARNNGYMLSRLRPNGQFIDSSLIVGGGHGTHNGYRYIDDELWIYSFILNGNNENTLVRFKYTPNVEISYGKYGMQDVFTGHPEKPYITPVINEKENKILYRIERPRSQWELENSMNYIEIRSLDDVDKNIDKVLHKISIPMRLTNETQPMQGVTFDEKYLYWYTGDSNPNNRNYLTAFDLETGEEAYQVNADYGGTLDSFPGEFAEAEGLQIYYDKDSGKKALMLGVTVGGDGNRTHRIFMIGQRGILEILHSRGVPFIMSDTGGRVKPLPMKPDKLKNLGMLTEPGLYYLYTDHTVQIDDFPLPREWRDAGWFLEVKPPQTGGDVIQILTRNSYARNMMTFERVLSGRTGDISDWNYVPKNSGKWERVPSFITKMSDINIVGMSFYLTTDDTKRFTDFPTERKGVAGWNLYVEASNTGGFVHRLVRNSVTASCEILLKNYDSKTSSGPWTLHEGRIIS
8 c_000000000001 7669 7960 r 0 1 prodigal v2.6.3 MATEEVKIKALLENDKQYFPATHWKAINGIPYAGSSDIDGLPQDGIISVDDKNKLDNLKIGEAGIIQNSIVQKSPNGKLWKITVDDSGKLGTVLFY
9 c_000000000001 7959 9543 r 0 1 prodigal v2.6.3 MDYHDHLSVMDFNELICENLLDVDYGSFKEYYELNEARYITFTVYRTTHNSFVFDLLICENFIIYHGEKYTIKQTAPKVEGDKVFIEVTAYHIMYEFQNHSVESNKLDDDSSETGKTPEYSLDEYLRYGFANQKTSVKMTYKIIGDFKRKIPIDELGNKNGLEYCKEAVDLFGCIIYPNDTEICFYSPETFYQRSEKVIRYQYNTDTVSATVSTLELRTAIKVFGKKYTAEEKKNYNPIRTTDIKYSNGFIKEGTYRTATIGSKATINFDCKYGNETVRFTIKKGSQGGIYKLILDGKQIKQISCFAKSVQSETIDLIKNIDKGKHVLEMIFLGEDPKNRIDISSNKKAKPCMYVGTEKSTVLNLIADNSGRNQYKAIVDYVADSAKQFGIRYANTQTNEDIETQDKLLEFAKKQINDTPKTELDVNYIGYEKIEPRDSVFFVHELMGYNTELKVVKLDRSHPFVNAIDEVSFSNEIKDMVQIQQALNRRVIAQDNRYNYQANRINHLYTSTLNSPFETMDIGSVLI
10 c_000000000001 9551 10376 r 0 1 prodigal v2.6.3 MQSFVKIIDGYKEEVITDFNQLIFLDARAESPNTNDNSVTINGVDGILPGAISFAPFSLVLRFGYDGIDVIDLNLFEHWFRSVFNRRHPYYVITSQMPGVKYAVNTANVTSNLKDGSSTEIEVSLNVYKGYSESVNWTDSEFLFDSNWMFENGIPLDFTPKYTHTSNQFTIWNGSTDTINPRFKHDLKILINLNGSGGFELVNYTTGDIFKYNKSIDKNTDFVLDGVYAYRDINRVGIDTNRGIITLAPGKNEFKIKGDVSDIKTTFKFPFIYR
11 c_000000000001 10375 14047 r 1 1 prodigal v2.6.3 KNYLGSIGKSFKEKFSKDMKDGYKSLSDDDLLKVGVNKFKGFMQTMGTASKKASDTVKVLGKGVSKETEKALEKYVHYSEENNRIMEKVRLNSGQITEDKAKKLLKIEADLSNNLIAEIEKRNKKELEKTQELIDKYSAFDEQEKQNILTRTKEKNDLRIKKEQELNQKIKELKEKALSDGQISENERKEIEKLENQRRDITVKELSKTEKEQERILVRMQRNRNAYSIDEASKAIKEAEKARKARKKEVDKQYEDDVIAIKNNVNLSKSEKDKLLAIADQRHKDEVRKAKSKKDAVVDVVKKQNKDIDKEMDLSSGRVYKNTEKWWNGLKSWWSNFREDQKKKSDKYAKEQEETARRNRENIKKWFGNAWDGVKTKTGEAFSKMGRNANHFGGEMKKMWSGIKGIPSKLSSSWSSAKSSVGYHTKAIANSTGKWFGKAWQSVKSTTGSIYNQTKQKYSDASDKAWVHSKSIWRGTSKWFSNAYKSAKGWLTDMANKSRSKWDNISSTAWSNAKSVWKGTSKWFSNSYKSLKGWTGDMYSRAHDRFDAISSSAWSNAKSVFNGFRKWLSKTYDWIRDIGKDMGRAAADLGKNVANKAIGGLNSMIGGINKISKAITDKNLIKPIPTLSTGTLAGKGVATDNSGALTQPTFAVLNDRGSGNAPGGGVQEVIHRADGTFHAPQGRDVVVPLGVGDSVINANDTLKLQRMGVLPKFHGGTKKKKWMEQVTENLGKKAGDFGSKAKNTAHNIKKGAEEMVEAAGDKIKDGASWLGDKIGDVWDYVQHPGKLVNKVMSGLNINFGGGANATVKIAKGAYSLLKKKLVDKVKSWFEDFGGGGDGSYLFDHPIWQRFGSYTGGLNFNGGRHYGIDFGMPTGTNIYAVKGGIADKVWTDYGGGNSIQIKTGANEWNWYMHLSKQLVRQGQRIKAGQLIGKSGATGNFVRGAHLHFQLMQGSHPGNDTAKDPEKWLKSLKGSGVRSGSGVNKAASAWAGDIRRAAKRMGVNVTSGDVGNIISLIQHESGGNAGITQSSSLRDINVLQGNPAKGLLQYIPQTFRHYAVRGHNNIYSGYDQLLAFFNNRYWRSQFNPRGGWSPSGPRRYANGGLITKHQLAEVGEGDKQEMVIPLTRRKRAIQLTEQVMRIIGMDGKPNNITVNNDTSTVEKLLKQIVMLSDKGNKLTDALIQTVSSQENNLGSNDAIRGLEKILSKQSGHRANANNYMGGLTN
The bug should be fixed now. Please update your repository with git pull
and then reinstall with pip install /path/to/PlasX
. Then, please rerun plasx search_de_novo_families...
(you don't need to rerun plasx setup
or other earlier steps).
This is the output I get. Is this what you see?
gene_callers_id contig start stop direction rev_compd length e_value accession
0 c_000000000001 271 829 r True 558 0.0 mmseqs_5_34857857
3 c_000000000001 3386 3686 r True 300 0.0 mmseqs_5_19291796
4 c_000000000001 3731 3896 r True 165 1.289e-20 mmseqs_20_48600463
5 c_000000000001 3888 4278 r True 390 2.193e-26 mmseqs_5_44369838
6 c_000000000001 4277 5777 r True 1500 0.0 mmseqs_20_32489040
7 c_000000000001 5743 7654 r True 1911 0.0 mmseqs_5_19647232
8 c_000000000001 7669 7960 r True 291 0.0 mmseqs_5_19517056
9 c_000000000001 7959 9543 r True 1584 0.0 mmseqs_5_20276345
9 c_000000000001 7959 9543 r True 1584 0.0 mmseqs_5_34921539
10 c_000000000001 9551 10376 r True 825 0.0 mmseqs_25_22345615
10 c_000000000001 9551 10376 r True 825 0.0 mmseqs_20_22345615
10 c_000000000001 9551 10376 r True 825 0.0 mmseqs_5_22651810
Thank you so much. Do I need to rerun everything that ran successfully too?
No problem, and thanks for bringing my attention to this bug!
You don't need to rerun the earlier commands that were successful. You can directly run and continue from plasx search_de_novo_families...
Hi some of my runs fail with this error
Traceback (most recent call last): File "conda/envs/plasx/bin/plasx", line 8, in
sys.exit(run())
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/plasx_script.py", line 140, in run
args.func(args)
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/plasx_script.py", line 38, in search
annotate_de_novo_families(args.gene_calls,
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/mmseqs.py", line 1948, in annotate_de_novo_families
hits = process_mmseqs_merge_search(mmseqs_source_db, target_db_dir, mmseqs_dir, ident_list,
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/mmseqs.py", line 1757, in process_mmseqs_merge_search
hits = pd.concat([shallow_filter(utils.unpickle(search_results_pattern.format(ident=ident)).assign(cluster_identity=ident),
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/mmseqs.py", line 1757, in
hits = pd.concat([shallow_filter(utils.unpickle(search_results_pattern.format(ident=ident)).assign(cluster_identity=ident),
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/mmseqs.py", line 1820, in shallow_filter
hits['q_length'] = utils.int_loc(hits['qId'].values, q_len)
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/pd_utils.py", line 48, in int_loc
assert np.all(isin_int(query, domain.index)) # Check that all of query is in the domain
File "conda/envs/plasx/lib/python3.8/site-packages/plasx/pd_utils.py", line 63, in isin_int
max_val = np.max(series)
File "<__array_function__ internals>", line 5, in amax
File "conda/envs/plasx/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2705, in amax
return _wrapreduction(a, np.maximum, 'max', axis, None, out,
File "conda/envs/plasx/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity
Command used plasx search_de_novo_families -g genecall.txt -o denovoltxt --threads 1 --splits 32 --overwrite
Do you know what could be happening?