harmslab / topiary

Python framework for doing ancestral sequence reconstruction
MIT License
33 stars 7 forks source link

Error while running seed-to-alignment for proteins without known paralogs #42

Open lbleicher opened 1 year ago

lbleicher commented 1 year ago

We are trying to run Topiary with sequences that seem to have no paralogs in most clades, but that might be named differently according to the species. So our database, which has Opisthokonts as the scope, has a sequence from yeast and one from humans. Even though they are named differently (RQC1_YEAST and TCF25_HUMAN), we believe they should be orthologs as there are virtually no species among model species with more than one sequence containing the same domain (PF04910 on Pfam). How do we prepare the input seed in this case? We tried both using two sequences, each one with their own aliases or using all aliases from both sequences on the two entries, but after the reciprocal blast generates a 4675 sequence alignment for the 02_recip-blast-dataframe.csv , the shrunk dataframe is reduced to just one sequence, and then seed-to-alignment stops on the Aligning sequences step with the following error

muscle 5.1.linux64 [] 7.6Gb RAM, 4 cores Built May 16 2023 07:53:40 (C) Copyright 2004-2021 Robert C. Edgar. https://drive5.com

Input: 1 seqs, avg length 676, max 676

double free or corruption (out) Traceback (most recent call last): File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py", line 478, in seed_to_alignment df = topiary.muscle.align(df) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/muscle/muscle.py", line 96, in align _run_muscle(input_fasta,output_fasta,super5,silent,muscle_cmd_args,muscle_binary) File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/muscle/muscle.py", line 216, in _run_muscle raise subprocess.CalledProcessError(return_code, cmd) subprocess.CalledProcessError: Command '['muscle', '-align', 'topiary-tmp_dULdoeuPiV_align-in.fasta', '-output', 'topiary-tmp_dULdoeuPiV_align-out.fasta']' died with <Signals.SIGABRT: 6>.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function ret = fcn(**fcn_args.dict) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'seed_to_alignment'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/amandacpa/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 26, in main() File "/home/amandacpa/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 21, in main wrap_function(seed_to_alignment, File "/home/amandacpa/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function raise RuntimeError(err) from e RuntimeError:

Function seed_to_alignment raised an error.

==================

This is the latest seed file we used which caused the error above

species,name,aliases,sequence,accession Homo sapiens,RQC1,TCF25;TCF-25;Nuclear localized protein 1;KIAA1049;NULP1;FKSG26;RQC1;YDR333C,MSRRALRRLRGEQRGQEPLGPGALHFDLRDDDDAEEEGPKRELGVRRPGGAGKEGVRVNNRFELINIDDLEDDP VVNGERSGCALTDAVAPGNKGRGQRGNTESKTDGDDTETVPSEQSHASGKLRKKKKKQKNKKSSTGEASENGLEDIDRILERIEDSTGLNRPGPAPLSSRKHVLYVEHRHLNPDTELKRYFGARAILGEQRPRQRQRVYPKCTWLTTPKSTWPRYSKPGLSMRLLESK KGLSFFAFEHSEEYQQAQHKFLVAVESMEPNNIVVLLQTSPYHVDSLLQLSDACRFQEDQEMARDLVERALYSMECAFHPLFSLTSGACRLDYRRPENRSFYLALYKQMSFLEKRGCPRTALEYCKLILSLEPDEDPLCMLLLIDHLALRARNYEYLIRLFQEWEAHR NLSQLPNFAFSVPLAYFLLSQQTDLPECEQSSARQKASLLIQQALTMFPGVLLPLLESCSVRPDASVSSHRFFGPNAEISQPPALSQLVNLYLGRSHFLWKEPATMSWLEENVHEVLQAVDAGDPAVEACENRRKVLYQRAPRNIHRHVILSEIKEAVAALPPDVTTQ SVMGFDPLPPSDTIYSYVRPERLSPISHGNTIALFFRSLLPNYTMEGERPEEGVAGGLNRNQGLNRLMLAVRDMMANFHLNDLEAPHEDDAEGEGEWD,Q9BQ70 Saccharomyces cerevisiae,RQC1,TCF25;TCF-25;Nuclear localized protein 1;KIAA1049;NULP1;FKSG26;RQC1;YDR333C,MSSRALRRLQDDNALLESLLSNSNANKMTSGKSTAGNIQKRENIFSMMNNVRDSDNSTDEGQ MSEQDEEAAAAGERDTQSNGQPKRITLASKSSRRKKNKKAKRKQKNHTAEAAKDKGSDDDDDDEEFDKIIQQFKKTDILKYGKTKNDDTNEEGFFTASEPEEASSQPWKSFLSLESDPGFTKFPISCLRHSCKFFQNDFKKLDPHTEFKLLFDDISPESLEDIDSMTS TPVSPQQLKQIQRLKRLIRNWGGKDHRLAPNGPGMHPQHLKFTKIRDDWIPTQRGELSMKLLSSDDLLDWQLWERPLDWKDVIQNDVSQWQKFISFYKFEPLNSDLSKKSMMDFYLSVIVHPDHEALINLISSKFPYHVPGLLQVALIFIRQGDRSNTNGLLQRALFV FDRALKANIIFDSLNCQLPYIYFFNRQFYLAIFRYIQSLAQRGVIGTASEWTKVLWSLSPLEDPLGCRYFLDHYFLLNNDYQYIIELSNSPLMNCYKQWNTLGFSLAVVLSFLRINEMSSARNALLKAFKHHPLQLSELFKEKLLGDHALTKDLSIDGHSAENLELKA YMARFPLLWNRNEEVTFLHDEMSSILQDYHRGNVTIDSNDGQDHNNINNLQSPFFIAGIPINLLRFAILSEESSVMAAIPSFIWSDNEVYEFDVLPPMPTSKESIEVVENIKTFINEKDLAVLQAERMQDEDLLNQIRQISLQQYIHENEESNENEG,Q05468