harmslab / topiary

Python framework for doing ancestral sequence reconstruction
MIT License
29 stars 6 forks source link

Error in topiary-seed-to-alignment #33

Open lbleicher opened 1 year ago

lbleicher commented 1 year ago

I attempted to create an alignment from a seed of six sequences from four species (this is my input csv file):

species,name,aliases,sequence,accession Homo sapiens,TTHY_HUMAN,hTTR,GPTGTGESKCPLMVKVLDAVRGSPAINVAVHVFRKAADDTWEPFASGKTSESGELHGLTTEEEFVEGIYKVEIDTKSYWKALGISPFHEHAEVVFTANDSGPRRYTIAALLSPYSYSTTAVVTNPKE,P02766 Saccoglossus kowalevskii,D1LXG7,Acorn worm HIUase,MSGYRIDILTNHLRASQAHSNLIEAVNMAGQQSPLTTHVLDTALGRPAAELPITLYSRSPEMAWLKIAAGKTNQDGRCPGLLTQETFHNGVYKIHFDTGTYHKALDTPGFYPYVEVVFEIHDPNQHYHVPLLLSPFSYSTYRGS,D1LXG7 Danio rerio,HIUH_DANRE,Danio Rerio HIUase,MNRLQHIRGHIVSADKHINMSATLLSPLSTHVLNIAQGVPGANMTIVLHRLDPVSSAWNILTTGITNDDGRCPGLITKENFIAGVYKMRFETGKYWDALGETCFYPYVEIVFTITNTSQHYHVPLLLSRFSYSTYRGS,Q06S87 Mus musculus,HIUH_MOUSE,Mouse HIUase,MATESSPLTTHVLDTASGLPAQGLCLRLSRLEAPCQQWMELRTSYTNLDGRCPGLLTPSQIKPGTYKLFFDTERYWKERGQESFYPYVEVVFTITKETQKFHVPLLLSPWSYTTYRGS,Q9CRB3 Mus musculus,TTHY_MOUSE,Mouse Transthyretin,GPAGAGESKCPLMVKVLDAVRGSPAVDVAVKVFKKTSEGSWEPFASGKTAESGELHGLTTDEKFVEGVYRVELDTKSYWKTLGISPFHEFADVVFTANDSGHRHYTIAALLSPYSYSTTAVVSNPQN,P07309

It seems it worked until the reciprocal blast, then I got the following error (it did create a blast results xml file and a initial dataframe file with 3414 lines):

==========

Building initial topiary dataframe.

BLASTing against NCBI database nr Performing 5 BLAST queries against the NCBI nr database on 1 threads. Depending on the server load, this could take awhile. This is a good time to grab a cup of coffee.

BLAST query complete.

Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Could not parse line MET VARIANT TO 1.7 ANGSTROMS RESOLUTION [Homo sapiens]. Skipping. Downloading 69 blocks of ~50 sequences... 100%|███████████████████████████████████████████| 69/69 [00:48<00:00, 1.42it/s] Getting OTT species ids for all species.

Unknown/unrecognized query ids (skipped): ott4992270 ott615879 ott7659998 ott773491 ott838061 ott898631


Doing reciprocal blast.

Downloading Danio rerio proteome Downloading proteome for taxid '7955' Process Process-11: Traceback (most recent call last): File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/ftp.py", line 36, in _ftp_thread ftp.retrbinary(cmd="RETR " + file_name, File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 445, in retrbinary return self.voidresp() ^^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 259, in voidresp resp = self.getresp() ^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 244, in getresp resp = self.getmultiline() ^^^^^^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 230, in getmultiline line = self.getline() ^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 218, in getline raise EOFError EOFError Traceback (most recent call last): File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py", line 406, in seed_to_alignment proteome_list.append(topiary.ncbi.get_proteome(taxid=this_taxid)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/proteome.py", line 217, in get_proteome ncbi_ftp_download(genome_url,file_base="_protein.faa.gz") File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py", line 80, in ncbi_ftp_download md5_dict = _read_md5_file(md5_file) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py", line 33, in _read_md5_file file = col[1][2:].strip()


IndexError: list index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function
    ret = fcn(**fcn_args.__dict__)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper
    raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException: 

Caught exception in function 'seed_to_alignment'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 26, in <module>
    main()
  File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment", line 21, in main
    wrap_function(seed_to_alignment,
  File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function
    raise RuntimeError(err) from e
RuntimeError: 

Function seed_to_alignment raised an error.

To see command line help, run topiary-seed-to-alignment --help
harmsm commented 1 year ago

Thanks for the bug report! I've never seen this one before. It looks to me like it is choking when downloading and reading the checksum file to validate the downloaded proteome. Is there a file called md5checksums.txt in the working directory? If so, could you paste its contents here?

Thanks for your help; hopefully we can resolve this quickly.

lbleicher commented 1 year ago

It does, but it is too long to be copied here. Let me know if the entire file is needed and I'll post it somewhere, here's its top and last lines:

d0e8e6b5c981ff948c657166270a7c88 ./Annotation_comparison/GCF_000002035.6_GRCz11_compare_prev.gbp.gz 9c8cd6fefb81746909c5438c5d18b758 ./Annotation_comparison/GCF_000002035.6_GRCz11_compare_prev.txt.gz ae405e37cdd4ebbd7d2032baf3e522fd ./annotation_hashes.txt c249b22d4cf0941cf13f6d626140686c ./GCF_000002035.6_GRCz11_assembly_regions.txt ce31297f9cb1eccf885afab7fac363ad ./GCF_000002035.6_GRCz11_assembly_report.txt 83c34be20a52645e7ec5e442e33d1ebf ./GCF_000002035.6_GRCz11_assembly_stats.txt f1de7661b5de92ddf4f2de72a7f2695f ./GCF_000002035.6_GRCz11_assembly_structure/all_alt_scaffold_placement.txt 9585b8ac806c110688debcea17387efa ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/AGP/alt.scaf.agp.gz 20b5e35f4c1033ce0426af1c190fe43b ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394460.1_NC_007112.7.asn 68ae916c656500b25ee32299657b080b ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394460.1_NC_007112.7.gff

(...)

020d65335491f47fa29beadf092e8695 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395039.1_NC_007129.7.asn 711757eada6306b792ed3465a27cdd85 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395039.1_NC_007129.7.gff 527ec55abb1a90ae0cdaac1426704c7d ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395040.1_NC_007129.7.asn d52fb0e9b5f4ee8c2bdb967e539b857e ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395040.1_NC_007129.7.gff e2718bcb708553147de9096927dccc23 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395041.1_NC_007129.7.asn f9159f5bc4f13df9e81370261d7954f8 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395041.1_NC_007129.7.gff 56d6c327d50909c7370f85a110678949 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395042.1_NC_007129.7.asn 1cbb61a0bc29d549bc47fec6f000c5a4 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395042.1_NC_007129.7.gff 43fe313f031e77461d6ace72c697f5b5 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395043.1_NC_007129.7.asn 1f179ec473e6f382e07cd5a4ed0f37d3 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395043.1_NC_007129.7.gff 99190d7cb0b3e1880e24f6dc51023e31 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018395044.1_NC_007129.7.asn 482703be37901d26

harmsm commented 1 year ago

I think we're getting somewhere. topiary assumes an md5 file has rows that have the format "hash file". It looks like this file is truncated (the last line looks like an incomplete hash). I suspect the md5 download terminated early for some reason.

If this is true, you should be able to re-run and successfully complete the job.

I can patch topiary to prevent this in the future by adding a check to make sure the md5 file downloads successfully, rather than cryptically crashing.

Maybe try re-running the job?

Thanks!

lbleicher commented 1 year ago

A rerun produced a very similar output - I got a 01_initial-dataframe.csv with the same filesize, the blast result XML is almost the same size (a difference of three lines), and md5checksums.txt is again truncated:

3ce0f863975dd40c4ea48c96478d30ed ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394748.1_NC_007120.7.gff 79e2f3921879aaeb9e1514403d22dec8 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394749.1_NC_007120.7.asn 9b2de24652cf7f00d6985fe118f743a4 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394749.1_NC_007120.7.gff 5bccb98a2853b1e9580ccfc54da20b71 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394750.1_NC_007120.7.asn c81f8f223679341488c081011ba742c3 ./GCF_000002035.6_GRCz11_assembly_structure/ALT_DRER_TU_1/alt_scaffolds/alignments/NW_018394750.1_NC_007120.7.gff 680fd33bd2b0

harmsm commented 1 year ago

That's strange. I just created a bug fix that downloads the md5sum file, checks if it is sane, then attempts to download it again if it fails. Would you be up for seeing if it fixes your problem? To download the change, you can follow the instructions below:

conda activate topiary
cd the_topiary_directory_wherever_you_downloaded_it
git checkout -b harmsm-main main
git pull git@github.com:harmsm/topiary.git main
python setup.py install

Best,

Mike

lbleicher commented 1 year ago

Hi, is the information correct? When I tried

git pull @.***:harmsm/topiary.git main

I get the message:

@.***: Permission denied (publickey). fatal: Could not read from remote repository.

Please make sure you have the correct access rights and the repository exists.

Em sex., 24 de fev. de 2023 às 20:43, Mike Harms @.***> escreveu:

That's strange. I just created a bug fix that downloads the md5sum file, checks if it is sane, then attempts to download it again if it fails. Would you be up for seeing if it fixes your problem? To download the change, you can follow the instructions below:

conda activate topiary cd the_topiary_directory_wherever_you_downloaded_it git checkout -b harmsm-main main git pull @.***:harmsm/topiary.git main python setup.py install

Best,

Mike

— Reply to this email directly, view it on GitHub https://github.com/harmslab/topiary/issues/33#issuecomment-1444726034, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADRZJB3Y257SOJN3ECJTLH3WZFBRHANCNFSM6AAAAAAVFS7K6E . You are receiving this because you authored the thread.Message ID: @.***>

jjvanantwerp commented 1 year ago

I am having the same error, actually. Below is my output.


Polishing alignment and re-aligning.

muscle 5.1.linux64 [] 396Gb RAM, 40 cores Built Feb 24 2022 03:16:15 (C) Copyright 2004-2021 Robert C. Edgar. https://drive5.com

Input: 2 seqs, length avg 392 max 408

00:00 17Mb 50.0% Derep 0 uniques, 0 dupes 00:00 17Mb 100.0% Derep 1 uniques, 0 dupes 00:00 18Mb 50.0% UCLUST 2 seqs EE<0.01, 0 centroids, 0 members 00:00 18Mb 100.0% UCLUST 2 seqs EE<0.01, 1 centroids, 0 members 00:00 18Mb CPU has 40 cores, defaulting to 20 threads 00:00 18Mb 50.0% UCLUST 2 seqs EE<0.30, 0 centroids, 0 members 00:00 18Mb 100.0% UCLUST 2 seqs EE<0.30, 1 centroids, 0 members 00:00 58Mb 100.0% Make cluster MFAs 1 clusters pass 1 1 clusters pass 2 00:00 58Mb 00:00 58Mb Align cluster 1 / 1 (2 seqs) 00:00 58Mb 00:00 58Mb 100.0% Calc posteriors 00:00 58Mb 100.0% UPGMA5 00:00 59Mb 100.0% Consensus sequences

Success. Alignment written to the alignment column in the dataframe. Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py", line 502, in seed_to_alignment df = topiary.quality.polish_alignment(kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/quality/polish.py", line 136, in polish_alignment top_fx_sparse = _get_cutoff(df.fx_in_sparse,pct=fx_sparse_percentile) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/quality/polish.py", line 43, in _get_cutoff return x[idx] ~^^^^^ IndexError: index 2 is out of bounds for axis 0 with size 2

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function ret = fcn(**fcn_args.dict) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'seed_to_alignment'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-seed-to-alignment", line 26, in main() File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-seed-to-alignment", line 21, in main wrap_function(seed_to_alignment, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function raise RuntimeError(err) from e RuntimeError:

Function seed_to_alignment raised an error.

To see command line help, run topiary-seed-to-alignment --help

and the last few lines of my md5checksums.txt are:

75f783e620888f6a20c9e7030bf54de2 ./Gnomon_models/GCF_000001405.40_GRCh38.p14_gnomon_model.gff.gz 962674e06f93bd8656cbd860c395f5ad ./Gnomon_models/GCF_000001405.40_GRCh38.p14_gnomon_protein.faa.gz f7262c7cc28373fd2aa0a225ef27a50e ./Gnomon_models/GCF_000001405.40_GRCh38.p14_gnomon_rna.fna.gz 3b7f12ebd3d129698e86fb8701bb9688 ./README_patch_release.txt 84b55637f312368687af5e2b545fcb8d ./RefSeq_transcripts_alignments/GCF_000001405.40_GRCh38.p14_knownrefseq_alns.bam e9e81c03bce9f45f7a4edfe44f0a8f8f ./RefSeq_transcripts_alignments/GCF_000001405.40_GRCh38.p14_knownrefseq_alns.bam.bai d9ff57b0fdb663665f2d0f9305831b30 ./RefSeq_transcripts_alignments/GCF_000001405.40_GRCh38.p14_modelrefseq_alns.bam 96404b41c1c023019a0ba6514d98c498 ./RefSeq_transcripts_alignments/GCF_000001405.40_GRCh38.p14_modelrefseq_alns.bam.bai

harmsm commented 1 year ago

Thanks for the report. I just merged the PR I referenced above. I still have not been able to reproduce the error on my end. Can one of you try the command again with the new version? To install the latest version, you could run the following:

cd topiary
git pull origin main
conda activate topiary
python -m pip install . -vv

Thanks! (And thanks for your patience with the delayed response to this thread).

jjvanantwerp commented 1 year ago

I followed the instructions above, and still ran into the same error. The terminal output is attached: Terminal SavedOutpiut.txt

harmsm commented 1 year ago

@jjvanantwerp Thanks for the bug report and sorry for the slow reply. Dangerous having the prof in charge of package maintenance...

I looked through your log file; it appears you're having a different bug. It's crashing when polishing the final alignment. If possible could you please post the last csv file that topiary writes out before the crash occurs? Based on when the crash occurs, I believe this should be 04_aligned-dataframe.csv.

Thanks.

jjvanantwerp commented 1 year ago

Yes, here it is. 04_aligned-dataframe.csv

lbleicher commented 1 year ago

Hey, Mike, sorry about the delay, I just had one of those crazy weeks. Here's the error I'm getting:

Downloading Danio rerio proteome Downloading proteome for taxid '7955' Process Process-11: Traceback (most recent call last): File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/ftp.py", line 36, in _ftp_thread ftp.retrbinary(cmd="RETR " + file_name, File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 445, in retrbinary return self.voidresp() ^^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 259, in voidresp resp = self.getresp() ^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 244, in getresp resp = self.getmultiline() ^^^^^^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 230, in getmultiline line = self.getline() ^^^^^^^^^^^^^^ File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/ftplib.py", line 218, in getline raise EOFError EOFError Traceback (most recent call last): File "/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py", line 92, in ncbi_ftp_download md5_dict[file_name]


KeyError: 'GCF_000002035.6_GRCz11_protein.faa.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/proteome.py",
line 217, in get_proteome
    ncbi_ftp_download(genome_url,file_base="_protein.faa.gz")
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/download.py",
line 96, in ncbi_ftp_download
    raise FileNotFoundError(err)
FileNotFoundError: The file 'GCF_000002035.6_GRCz11_protein.faa.gz' is not
present on the NCBI.
Full path:
ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11//genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11/GCF_000002035.6_GRCz11_protein.faa.gz

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py",
line 32, in wrapper
    value = func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py",
line 406, in seed_to_alignment
    proteome_list.append(topiary.ncbi.get_proteome(taxid=this_taxid))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/ncbi/entrez/proteome.py",
line 241, in get_proteome
    raise RuntimeError(err)
RuntimeError:
Could not download proteome GCF_000002035.6_GRCz11_protein.faa.gz.
This can happen if an assembly is in the NCBI database but
does not have an associated _protein.tar.gz file. If you
are running this as part the seed_to_alignment pipeline,
you have a couple of options. 1) You can replace the problematic
species (taxid = 7955) in your seed dataset and start
the pipeline again. 2) You can edit the 01_initial-dataframe.csv
file, adding or editing the column 'recip_blast'. Set this to
'FALSE' for every row *except* the rows with key_species = 'TRUE'.
Set this to 'FALSE' for the problematic species. You
can then restart the pipeline with the --restart flag. Topiary
will not use this species for reciprocal BLAST, but will still
treat it as a key species in other respects.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py",
line 185, in wrap_function
    ret = fcn(**fcn_args.__dict__)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/interface.py",
line 38, in wrapper
    raise WrappedFunctionException(err) from e
topiary._private.interface.WrappedFunctionException:

Caught exception in function 'seed_to_alignment'. Returning to starting
directory and cleaning up. Check error stack for cause of
this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment",
line 26, in <module>
    main()
  File "/home/lucas/miniconda3/envs/topiary/bin/topiary-seed-to-alignment",
line 21, in main
    wrap_function(seed_to_alignment,
  File
"/home/lucas/miniconda3/envs/topiary/lib/python3.11/site-packages/topiary/_private/wrap.py",
line 189, in wrap_function
    raise RuntimeError(err) from e
RuntimeError:

Function seed_to_alignment raised an error.

To see command line help, run topiary-seed-to-alignment --help

Em qui., 2 de mar. de 2023 às 16:47, Mike Harms ***@***.***>
escreveu:

> Thanks for the report. I just merged the PR I referenced above. I still
> have not been able to reproduce the error on my end. Can one of you try the
> command again with the new version? To install the latest version, you
> could run the following:
>
> cd topiary
> git pull origin main
> conda activate topiary
> python -m pip install . -vv
>
> Thanks! (And thanks for your patience with the delayed response to this
> thread).
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/harmslab/topiary/issues/33#issuecomment-1452456847>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADRZJBZEVUSZHE4GRQLMT2LW2D2LRANCNFSM6AAAAAAVFS7K6E>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
harmsm commented 1 year ago

@jjvanantwerp : Thanks for the file! I am able to reproduce the error and am working on this now.

@lbleicher : thanks for the detailed error message. I'll look into.

harmsm commented 1 year ago

@jjvanantwerp Should be fixed now. I just merged a PR with the change. You should be able to run the following to install the latest and greatest version. Thanks for helping troubleshoot!

cd topiary
git pull origin main
conda activate topiary
python -m pip install . -vv
jjvanantwerp commented 1 year ago

Yes, I was able to progress past the alignment! I think this issue can be closed. Unfortunately, I will need to open another for what appears to be the same error in the next step. I am not sure if here is the best place to discuss that or if I should open a new issue - it's that same place in the wrap function, line 189.

harmsm commented 1 year ago

Glad we made progress! The wrap function will always throw an error; it’s a way to capture internal errors and make sure the crashing function returns to the right directory, clean up, etc. Maybe paste the whole error?

Thanks!

Mike

On Mar 14, 2023, at 9:53 PM, James @.***> wrote:

Yes, I was able to progress past the alignment! I think this issue can be closed. Unfortunately, I will need to open another for what appears to be the same error in the next step. I am not sure if here is the best place to discuss that or if I should open a new issue - it's that same place in the wrap function, line 189.

— Reply to this email directly, view it on GitHub https://github.com/harmslab/topiary/issues/33#issuecomment-1469332469, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFZA6R2N3IBQXXOQK4A7W3W4FDLHANCNFSM6AAAAAAVFS7K6E. You are receiving this because you commented.

jjvanantwerp commented 1 year ago

Terminal Saved Output Mar 15.txt

I have attached the whole terminal session, but below is the relevant part. It says the issue is that my alignment is too small, and I'm not sure if there's a way to address this here or upstream.

(topiary_resolved) [vanant25@dev-intel16 topiary]$ topiary-alignment-to-ancestors ER_Final_Alignment.csv --out_dir ER_ASR --num_threads 1

Non-microbial dataset detected. Gene/species tree reconciliation will be performed

Checking raxml-ng

installed:       Y
binary_path:     /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng
binary runs:     Y
version:         1.1
minimum version: 1.1
passes:          Y

Checking generax

installed:       Y
binary_path:     /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/generax
binary runs:     Y
version:         2.0.4
minimum version: 2.0
passes:          Y

Checking mpirun

installed:       Y
binary_path:     /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/mpirun
binary runs:     Y
version:         4.1.5
minimum version: 0.0
passes:          Y

topiary is starting a find_best_model calculation in ./00_find-model:

Generating maximum parsimony tree.

Launching raxml-ng, 0:00:00.007415 (H:M:S)

topiary ran a find_best_model calculation in ./00_find-model:


Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 336, in launch raise RuntimeError(err) RuntimeError: ERROR: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng returned 1


/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng output

RAxML-NG v. 1.1 released on 29.11.2021 by The Exelixis Lab. Developed by: Alexey M. Kozlov and Alexandros Stamatakis. Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth. Latest version: https://github.com/amkozlov/raxml-ng Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 28 cores, 125 GB RAM

RAxML-NG was called at 15-Mar-2023 00:48:29 as follows:

/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng --start --msa alignment.phy --model LG --seed 3997117630 --threads 1 --tree pars{1}

Analysis options: run mode: Starting tree generation start tree(s): parsimony (1) random seed: 3997117630 SIMD kernels: AVX2 parallelization: coarse-grained (auto), NONE/sequential

[00:00:00] Reading alignment from file: alignment.phy [00:00:00] Loaded alignment with 2 taxa and 410 sites

ERROR: Your alignment contains less than 4 sequences!

ERROR: Alignment check failed (see details above)!

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/_raxml.py", line 189, in run_raxml interface.launch(cmd, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'launch'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/model.py", line 260, in find_best_model _generate_parsimony_tree(supervisor.alignment, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/model.py", line 45, in _generate_parsimony_tree run_raxml(run_directory=run_directory, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/_raxml.py", line 197, in run_raxml raise RuntimeError from e RuntimeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/pipeline/alignment_to_ancestors.py", line 323, in alignment_to_ancestors topiary.find_best_model(df, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'find_best_model'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function ret = fcn(**fcn_args.dict) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'alignment_to_ancestors'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-alignment-to-ancestors", line 26, in main() File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-alignment-to-ancestors", line 21, in main wrap_function(alignment_to_ancestors, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function raise RuntimeError(err) from e RuntimeError:

Function alignment_to_ancestors raised an error.

To see command line help, run topiary-alignment-to-ancestors --help

(topiary_resolved) [vanant25@dev-intel16 topiary]$

harmsm commented 1 year ago

Hi James,

Yep, alignment is too small. This mini-dataset only has a few, nearly identical, sequences that are trimmed out during the quality control step. Maybe try going back upstream, doing seed_to_alignment before feeding into ali_to_anc? That should BLAST and pull many more sequences down for your tree inference.

Best,

Mike

On Mar 14, 2023, at 10:00 PM, James @.***> wrote:

Terminal Saved Output Mar 15.txt https://github.com/harmslab/topiary/files/10976158/Terminal.Saved.Output.Mar.15.txt I have attached the whole terminal session, but below is the relevant part. It says the issue is that my alignment is too small, and I'm not sure if there's a way to address this here or upstream.

(topiary_resolved) @.*** topiary]$ topiary-alignment-to-ancestors ER_Final_Alignment.csv --out_dir ER_ASR --num_threads 1

Non-microbial dataset detected. Gene/species tree reconciliation will be performed

Checking raxml-ng

installed: Y binary_path: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng binary runs: Y version: 1.1 minimum version: 1.1 passes: Y Checking generax

installed: Y binary_path: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/generax binary runs: Y version: 2.0.4 minimum version: 2.0 passes: Y Checking mpirun

installed: Y binary_path: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/mpirun binary runs: Y version: 4.1.5 minimum version: 0.0 passes: Y topiary is starting a find_best_model calculation in ./00_find-model:

Generating maximum parsimony tree.

Launching raxml-ng, 0:00:00.007415 (H:M:S)

topiary ran a find_best_model calculation in ./00_find-model:

Crashed after 0:00:00.021205 (H:M:S) Please check ./00_find-model/working Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 336, in launch raise RuntimeError(err) RuntimeError: ERROR: /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng returned 1

/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng output

RAxML-NG v. 1.1 released on 29.11.2021 by The Exelixis Lab. Developed by: Alexey M. Kozlov and Alexandros Stamatakis. Contributors: Diego Darriba, Tomas Flouri, Benoit Morel, Sarah Lutteropp, Ben Bettisworth. Latest version: https://github.com/amkozlov/raxml-ng Questions/problems/suggestions? Please visit: https://groups.google.com/forum/#!forum/raxml

System: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz, 28 cores, 125 GB RAM

RAxML-NG was called at 15-Mar-2023 00:48:29 as follows:

/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/raxml-ng --start --msa alignment.phy --model LG --seed 3997117630 --threads 1 --tree pars{1}

Analysis options: run mode: Starting tree generation start tree(s): parsimony (1) random seed: 3997117630 SIMD kernels: AVX2 parallelization: coarse-grained (auto), NONE/sequential

[00:00:00] Reading alignment from file: alignment.phy [00:00:00] Loaded alignment with 2 taxa and 410 sites

ERROR: Your alignment contains less than 4 sequences!

ERROR: Alignment check failed (see details above)!

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/_raxml.py", line 189, in run_raxml interface.launch(cmd, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'launch'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/model.py", line 260, in find_best_model _generate_parsimony_tree(supervisor.alignment, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/model.py", line 45, in _generate_parsimony_tree run_raxml(run_directory=run_directory, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/raxml/_raxml.py", line 197, in run_raxml raise RuntimeError from e RuntimeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/pipeline/alignment_to_ancestors.py", line 323, in alignment_to_ancestors topiary.find_best_model(df, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'find_best_model'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function ret = fcn(**fcn_args.dict) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'alignment_to_ancestors'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-alignment-to-ancestors", line 26, in main() File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-alignment-to-ancestors", line 21, in main wrap_function(alignment_to_ancestors, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function raise RuntimeError(err) from e RuntimeError:

Function alignment_to_ancestors raised an error.

To see command line help, run topiary-alignment-to-ancestors --help

(topiary_resolved) @.*** topiary]$

— Reply to this email directly, view it on GitHub https://github.com/harmslab/topiary/issues/33#issuecomment-1469337323, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFZA6TYIZHZMOGZ7IIEVG3W4FEFDANCNFSM6AAAAAAVFS7K6E. You are receiving this because you commented.

jjvanantwerp commented 1 year ago

The input file was the output of the seed_to_alignment, i thought. I used the 05_clean-aligned-dataframe.csv as the input for ali_to_anc, without any cleaning.

harmsm commented 1 year ago

Ah, I think I might understand. Did you only include a one human sequence in there as a seed? If so, topiary is only looking for human/primate sequences because the seed dataset specifies the taxonomic scope as only human. You’ll want to add a sequence from another species that indicates the taxonomic scope to reconstruct (e.g., human-bony fishes, all mammals, etc.). We describe how to think about this here:

https://topiary-asr.readthedocs.io/en/latest/protocol.html#define-the-problem-doc

If that’s not what’s going on, we can definitely keep troubleshooting to find the bug.

Mike

On Mar 14, 2023, at 10:06 PM, James @.***> wrote:

The input file was the output of the seed_to_alignment, i thought. I used the 05_clean-aligned-dataframe.csv as the input for ali_to_anc, without any cleaning.

— Reply to this email directly, view it on GitHub https://github.com/harmslab/topiary/issues/33#issuecomment-1469343265, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFZA6T2O54CFHFA5AHNQ3DW4FE5BANCNFSM6AAAAAAVFS7K6E. You are receiving this because you commented.

jjvanantwerp commented 1 year ago

No, that's what I did. I was hoping Topiary would 'fill in' around that sequence, but it seems like it's looking for that to be the 'edge' of sequence space instead. I will have to redesign my experiment to incorporate this behavior.

harmsm commented 1 year ago

Hopefully it works for you then. 🤞Topiary fills in sequences within the species boundaries defined in the seed data frame. You basically need one more sequence in your seed data frame to start it going.MikeSent from my iPhoneOn Mar 14, 2023, at 22:36, James @.***> wrote: No, that's what I did. I was hoping Topiary would 'fill in' around that sequence, but it seems like it's looking for that to be the 'edge' of sequence space instead. I will have to redesign my experiment to incorporate this behavior.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

jjvanantwerp commented 1 year ago

I've filed out the seed alignment, and ran into an error that I suspect is because of the format of my seed alignment. I have attached the seed alignment. Do you recognize what might cause this:

ER_Seed.csv

File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/opentree/util.py", line 69, in _validate_ott_or_species raise ValueError(err) ValueError: Could not process ott None. Should be an integer or string with format ottINTEGER

Here is the full error stack:

(topiary_resolved) [vanant25@dev-intel18 topiary]$ topiary-seed-to-alignment ER_Seed.csv --out_dir ER_Align

Checking blastp

installed:       Y
binary_path:     /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/blastp
binary runs:     Y
version:         2.13.0+
minimum version: 2.0
passes:          Y

Checking makeblastdb

installed:       Y
binary_path:     /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/makeblastdb
binary runs:     Y
version:         2.13.0+
minimum version: 2.0
passes:          Y

Checking muscle

installed:       Y
binary_path:     /mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/muscle
binary runs:     Y
version:         5.1.linux64
minimum version: 5.0
passes:          Y

Building initial topiary dataframe.

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/opentree/util.py", line 65, in _validate_ott_or_species check_ott = int(check_ott) ^^^^^^^^^^^^^^ TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 32, in wrapper value = func(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/pipeline/seed_to_alignment.py", line 371, in seed_to_alignment out = topiary.df_from_seed(kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/io/seed.py", line 312, in df_from_seed seed_df, key_species, paralog_patterns, species_aware = topiary.io.read_seed(seed_df, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/io/seed.py", line 126, in read_seed mrca = topiary.opentree.ott_to_mrca(ott_list=ott_list, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/opentree/util.py", line 426, in ott_to_mrca ott_list = _validate_ott_or_species(ott_list,species_list) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/opentree/util.py", line 69, in _validate_ott_or_species raise ValueError(err) ValueError: Could not process ott None. Should be an integer or string with format ottINTEGER

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 185, in wrap_function ret = fcn(**fcn_args.dict) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/interface.py", line 38, in wrapper raise WrappedFunctionException(err) from e topiary._private.interface.WrappedFunctionException:

Caught exception in function 'seed_to_alignment'. Returning to starting directory and cleaning up. Check error stack for cause of this error.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-seed-to-alignment", line 26, in main() File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/bin/topiary-seed-to-alignment", line 21, in main wrap_function(seed_to_alignment, File "/mnt/home/vanant25/anaconda3/envs/topiary_resolved/lib/python3.11/site-packages/topiary/_private/wrap.py", line 189, in wrap_function raise RuntimeError(err) from e RuntimeError:

Function seed_to_alignment raised an error.

To see command line help, run topiary-seed-to-alignment --help

harmsm commented 1 year ago

Okay, it should work now. (Or, actually, it should fail now with a useful error). It turns out one of your species, Gulo gulo luscus, is not in the open tree of life database. Topiary was supposed to let you know this was the problem, but was choking on opentreeoflife output. I just pushed a change so it should now do so.

I suspect you want to replace "Gulo gulo luscus" with "Gulo gulo" (https://tree.opentreeoflife.org/taxonomy/browse?id=752563)

Best,

Mike

jjvanantwerp commented 1 year ago

I changed the species name in the seed alignment, which advanced me further than I have been able to get before. Unfortunately, the alignment hit a critical error again. I have uploaded what I think is the final alignment file that was used.

Terminal Saved Output_Topiary_Error.txt 03_shrunk-dataframe.csv