jasonsahl / wgfast

Whole genome focused array SNP typer
13 stars 5 forks source link

Cannot find RAxML_labelledTree.out #1

Closed CarolineOhrman closed 6 years ago

CarolineOhrman commented 6 years ago

Hi Jason!

Im running wgfast with 3 inputs as required; a reference.fasta reference sequence (in the reference dir) and a bestsnps.tsv marix (from nasp also in the reference dir) and paired reads from MiSeq run (in the target dir) as input.

wgfast -r reference -d target

The program starts with the following settings

WG-FAST pipeline starting
WG-FAST was invoked with the following parameters:
-m /myproject/wgfast/reference/bestsnp.tsv \
-t  \
-r /myproject/wgfast/reference/reference.fasta \
-d target \
-x  \
-p 2 \
-c 3 \
-o 0.9 \
-k F \
-s T \
-n 100 \
-g T \
-e /tmp \
-z ML \
-f 0.1 \
-y F \
-j ASC_GTRGAMMA \
-i T \
-q EMIT_ALL_CONFIDENT_SITES

Now to the issue. I'm having problem with the raxML tree.

sed: can't read RAxML_labelledTree.out: No such file or directory
mv: cannot stat 'RAxML_labelledTree.out': No such file or directory

From what I can see is the code implemented to give a tree variable to the main function from the argument parser. In parser, no option for the tree is set resulting in a empty tree variable in the mainfunction.

tree = "".join(glob.glob(os.path.join(ref_path, "*.tree")))

So when main is calling the run_raxml in the util.py it provides no tree and no raxml.out is created and can then not be opend.

Would be happy if I could get qiuck help on this since I need your nice program during the upcoming week.

Thank you! Caroline

jasonsahl commented 6 years ago

Caroline, Yes, you need to provide the script with a tree. I'm surprised that the script ran at all without providing a tree. If you need a tree, you can generate this with the wgfast_prep.py script provided with the repository.

CarolineOhrman commented 6 years ago

I realized that the tree should be in the reference folder so that problem is solved but I still have problem with raxML.

First I used my own nasp.tree and bestsnp.tsv but to rule out that it was my files that was causing my problem I have now used your files from https://github.com/jasonsahl/MTS_databases.

It seems that raxML isnt able to run correctly. I get this message in the out.raxml.out file

Error you want to place query sequences into a tree using (null), but you have provided an input tree that already contains all taxa

Should i use a specific version of raxmll (im using 8.2.10) or do you have any idea why i get this error?

Kind regards, Caroline

jasonsahl commented 6 years ago

Caroline, Perhaps the reads aren't being processed properly. Let me look into it. My hope is to push changes with updated documentation early next week. I'll test everything thoroughly before I push those tests. Sorry about the troubles. Jason

jasonsahl commented 6 years ago

Caroline,

Could you please update the repository and try again?

thanks, Jason

CarolineOhrman commented 6 years ago

I've updated and tried again and I still have the same error printed in the out.raxml.out file. This is the last lines in that file.


Alignment has 926 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 0.00%

RAxML likelihood-based placement algorithm

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

Partition: 0 Alignment Patterns: 926 Name: No Name Provided DataType: DNA Substitution Matrix: GTR Correcting likelihood for ascertainment bias

RAxML was called as follows:

raxmlHPC-SSE3 -f V -s out.fasta -m ASC_GTRGAMMA -n out -t /home/caroline/MTS_databases/Francisella_tularensis/nasp_raxml.tree --asc-corr=lewis --no-bfgs > /dev/null 2>&1

Error you want to place query sequences into a tree using (null), but you have provided an input tree that already contains all taxa (END)

I get the same erors with the nasp and wgfast_prep files I prepared my self and with those in your MTS_database (cant use the nasp.PARAMS file because of non matching binary file versions).

Caroline

jasonsahl commented 6 years ago

Caroline,

Can you share your “out.fasta” file? If you can send the entire output that WG-FAST printed to screen, that would also be helpful.

thanks, Jason

On Dec 4, 2017, at 12:59 AM, Caroline Öhrman notifications@github.com wrote:

I've updated and tried again and I still have the same error printed in the out.raxml.out file. This is the last lines in that file.

Alignment has 926 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 0.00%

RAxML likelihood-based placement algorithm

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

Partition: 0 Alignment Patterns: 926 Name: No Name Provided DataType: DNA Substitution Matrix: GTR Correcting likelihood for ascertainment bias

RAxML was called as follows:

raxmlHPC-SSE3 -f V -s out.fasta -m ASC_GTRGAMMA -n out -t /home/caroline/MTS_databases/Francisella_tularensis/nasp_raxml.tree --asc-corr=lewis --no-bfgs > /dev/null 2>&1

Error you want to place query sequences into a tree using (null), but you have provided an input tree that already contains all taxa (END)

I get the same erors with the nasp and wgfast_prep files I prepared my self and with those in your MTS_database (cant use the nasp.PARAMS file because of non matching binary file versions).

Caroline

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jasonsahl/wgfast/issues/1#issuecomment-348886568, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEmgIZ0RMxOBUH8y1oWtDq82bJPqJrhks5s86ZdgaJpZM4QhrWK.

CarolineOhrman commented 6 years ago

This is the output. I do not have any file called RAxML_labelledTree* in my output folder since (what I believe at least) raxml failed to run. Thank you! Caroline

python /home/carodl/bin/wgfast/wgfast.py -r ../MTS_databases/Francisella_tularensis/ -d ../wgfast_input/tularensis/ LOG: 2017/12/04 08:44:51 - testing the paths of all dependencies /home/carodl/bin/wgfast/standard-RAxML/raxmlHPC-SSE3 citation: 'Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (2014).' citation: 'Berger SA, Krompass D, Stamatakis A. Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol. 2011;60(3):291-302' /home/carodl/bin/miniconda3/bin/samtools citation: 'Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9' /usr/bin/bwa citation: 'Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXivorg. 2013(arXiv:1303.3997 [q-bio.GN])' Patristic distances calculated with DendroPy citation: 'Sukumaran J, Holder MT. DendroPy: a Python library for phylogenetic computing. Bioinformatics. 2010;26(12):1569-71. Epub 2010/04/28. doi: 10.1093/bioinformatics/btq228. PubMed PMID: 20421198' Also uses GATK for variant calling citation: 'McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010;20(9):1297-303' Uses trimmomatic for read trimming citation: Bolger A.M., Lohse M., Usadel B. Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics. 2014. Doi:10.1093/bioinformatics/btu170 Uses BioPython for FASTA parsing citation :Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422-3

LOG: 2017/12/04 08:44:51 - WG-FAST pipeline starting LOG: 2017/12/04 08:44:51 - WG-FAST was invoked with the following parameters: -m /home/carodl/MTS_databases/Francisella_tularensis/bestsnp.tsv \ -t /home/carodl /MTS_databases/Francisella_tularensis/nasp_raxml.tree \ -r /home/carodl /MTS_databases/Francisella_tularensis/reference.fasta \ -d ../wgfast_input/reads/ \ -x NULL \ -p 2 \ -c 3 \ -o 0.9 \ -k F \ -s T \ -n 100 \ -g T \ -e /tmp \ -z ML \ -f 0.1 \ -y F \ -j ASC_GTRGAMMA \ -i T \ -q EMIT_ALL_CONFIDENT_SITES

LOG: 2017/12/04 08:44:53 - sequence(s) inserted into tree sed: can't read RAxML_labelledTree.out: No such file or directory mv: cannot stat 'RAxML_labelledTree.out': No such file or directory Traceback (most recent call last): File "/home/carodl/bin/wgfast/wgfast.py", line 351, in options.only_subs,options.model,options.trim,options.gatk_method) File "/home/carodl/bin/wgfast/wgfast.py", line 191, in main suffix = run_raxml("out.fasta", tree,"out.classification_results.txt", "V", parameters, model, "out") File "/home/carodl/bin/wgfast/wg_fast/util.py", line 478, in run_raxml subprocess.check_call("mv RAxML_labelledTree.%s %s_tree_including_unknowns_edges.tree" % (suffix, suffix) , shell=True) File "/home/carodl/bin/miniconda3/envs/wgfast/lib/python2.7/subprocess.py", line 186, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'mv RAxML_labelledTree.out out_tree_including_unknowns_edges.tree' returned non-zero exit status 1

Från: Jason Sahl [mailto:notifications@github.com] Skickat: den 4 december 2017 22:27 Till: jasonsahl/wgfast wgfast@noreply.github.com Kopia: Caroline Öhrman caroline.ohrman@foi.se; Author author@noreply.github.com Ämne: Re: [jasonsahl/wgfast] Cannot find RAxML_labelledTree.out (#1)

Caroline,

Can you share your “out.fasta” file? If you can send the entire output that WG-FAST printed to screen, that would also be helpful.

thanks, Jason

On Dec 4, 2017, at 12:59 AM, Caroline Öhrman notifications@github.com wrote:

I've updated and tried again and I still have the same error printed in the out.raxml.out file. This is the last lines in that file.

Alignment has 926 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 0.00%

RAxML likelihood-based placement algorithm

Using 1 distinct models/data partitions with joint branch length optimization

All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter

Partition: 0 Alignment Patterns: 926 Name: No Name Provided DataType: DNA Substitution Matrix: GTR Correcting likelihood for ascertainment bias

RAxML was called as follows:

raxmlHPC-SSE3 -f V -s out.fasta -m ASC_GTRGAMMA -n out -t /home/caroline/MTS_databases/Francisella_tularensis/nasp_raxml.tree --asc-corr=lewis --no-bfgs > /dev/null 2>&1

Error you want to place query sequences into a tree using (null), but you have provided an input tree that already contains all taxa (END)

I get the same erors with the nasp and wgfast_prep files I prepared my self and with those in your MTS_database (cant use the nasp.PARAMS file because of non matching binary file versions).

Caroline

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jasonsahl/wgfast/issues/1#issuecomment-348886568, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEmgIZ0RMxOBUH8y1oWtDq82bJPqJrhks5s86ZdgaJpZM4QhrWK.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jasonsahl/wgfast/issues/1#issuecomment-349111348, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI7sNmzYjLmjKnrMdEIbrZMWgDNwVCv3ks5s9GPAgaJpZM4QhrWK.

jasonsahl commented 6 years ago

Thanks, Caroline. Your samples don't seem to be processed. Can you give me an idea of what your files are called? My guess is that your reads aren't being analyzed and I don't have a good check in place before RAxML is called. See an example of a successful log below. There should be information about each sample before RAxML is called:

LOG: 2017/12/05 09:39:08 - WG-FAST was invoked with the following parameters: -m /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp_matrix.tsv \ -t /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp_raxml.tree \ -r /Users/jasonsahl/tools/wgfast/run/wg_fast_files/reference.fasta \ -d ../test_data/reads/ \ -x /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp.PARAMS \ -p 2 \ -c 3 \ -o 0.9 \ -k F \ -s F \ -n 100 \ -g T \ -e /tmp \ -z ML \ -f 0.1 \ -y F \ -j ASC_GTRGAMMA \ -i T \ -q EMIT_ALL_CONFIDENT_SITES

number of SNPs in genome ECOLI = 1 number of discarded SNPs in genome ECOLI = 0 number of discarded Reference positions in genome ECOLI = 27

number of SNPs in genome IS03 = 1 number of discarded SNPs in genome IS03 = 0 number of discarded Reference positions in genome IS03 = 27

LOG: 2017/12/05 09:40:29 - number of callable positions in genome ECOLI = 24 LOG: 2017/12/05 09:40:29 - number of callable positions in genome IS03 = 24 LOG: 2017/12/05 09:40:29 - sequence(s) inserted into tree

LOG: 2017/12/05 09:40:29 - Insertion likelihood values: sample_name insertion_likelihood number of potential insertion nodes ECOLI 0.016790 60 IS03 0.016790 60

LOG: 2017/12/05 09:40:30 - all done

CarolineOhrman commented 6 years ago

Hi again Jason. Sorry for not replying your answer but other things came up. Now it’s a new year so new possibilities with wgfast ☺

I ran the program again and now only with the test_data provided in wgfast. I sure had some problem with the input files but first I had the same error. But then after also using the raxml version you provided it worked! In the bottom you have the output. Looks correct right?

I will go on testing with my own data and hopefully I will get that to work also. Thank you for your help!

/Caroline

Output ☺

LOG: 2018/01/04 16:17:01 - WG-FAST pipeline starting LOG: 2018/01/04 16:17:01 - WG-FAST was invoked with the following parameters: -m /home/carodl/wgfast_testdata/reference/nasp_matrix.tsv \ -t /home/carodl/wgfast_testdata/reference/nasp_raxml.tree \ -r /home/carodl/wgfast_testdata/reference/reference.fasta \ -d /home/carodl/wgfast_testdata/reads/ECOLI/ \ -x /home/carodl/wgfast_testdata/reference/nasp.PARAMS \ -p 2 \ -c 3 \ -o 0.9 \ -k F \ -s T \ -n 100 \ -g T \ -e /tmp \ -z ML \ -f 0.1 \ -y F \ -j ASC_GTRGAMMA \ -i T \ -q EMIT_ALL_CONFIDENT_SITES

number of SNPs in genome ECOLI = 1 number of discarded SNPs in genome ECOLI = 0 number of discarded Reference positions in genome ECOLI = 27

LOG: 2018/01/04 16:18:48 - number of callable positions in genome ECOLI = 24 LOG: 2018/01/04 16:18:48 - sequence(s) inserted into tree

LOG: 2018/01/04 16:18:48 - Insertion likelihood values: sample_name insertion_likelihood number of potential insertion nodes ECOLI 0.015844 60

/mnt/powervault/carodl/bin/wgfast/standard-RAxML/raxmlHPC-PTHREADS-SSE3 *citation: 'Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22(21):2688-90' LOG: 2018/01/04 16:18:49 - running subsample routine, forcing GTRGAMMA model LOG: 2018/01/04 16:18:49 - creating PARAMS file LOG: 2018/01/04 16:18:49 - adding unknowns to tree

problem grabbing distances- make sure that subsample.distances.txt files aren't empty True distance between Reference and ECO_SE11nucmer = 0.01 Sample: ECOLI Subsample distances between Reference and subsample greater than true value = 0 Subsample distances between Reference and subsample equal to true value = 0 Subsample distances between Reference and subsample less than true value = 0 problem grabbing distances- make sure that subsample.distances.txt files aren't empty True distance between Reference and ECO_MS_107_1nucmer = 0.03 Sample: ECOLI Subsample distances between Reference and subsample greater than true value = 0 Subsample distances between Reference and subsample equal to true value = 0 Subsample distances between Reference and subsample less than true value = 0 LOG: 2018/01/04 16:18:54 - all done

Från: Jason Sahl [mailto:notifications@github.com] Skickat: den 5 december 2017 17:41 Till: jasonsahl/wgfast wgfast@noreply.github.com Kopia: Caroline Öhrman caroline.ohrman@foi.se; Author author@noreply.github.com Ämne: Re: [jasonsahl/wgfast] Cannot find RAxML_labelledTree.out (#1)

Thanks, Caroline. Your samples don't seem to be processed. Can you give me an idea of what your files are called? My guess is that your reads aren't being analyzed and I don't have a good check in place before RAxML is called. See an example of a successful log below. There should be information about each sample before RAxML is called:

LOG: 2017/12/05 09:39:08 - WG-FAST was invoked with the following parameters: -m /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp_matrix.tsv -t /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp_raxml.tree -r /Users/jasonsahl/tools/wgfast/run/wg_fast_files/reference.fasta -d ../test_data/reads/ -x /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp.PARAMS -p 2 -c 3 -o 0.9 -k F -s F -n 100 -g T -e /tmp -z ML -f 0.1 -y F -j ASC_GTRGAMMA -i T -q EMIT_ALL_CONFIDENT_SITES number of SNPs in genome ECOLI = 1 number of discarded SNPs in genome ECOLI = 0 number of discarded Reference positions in genome ECOLI = 27 number of SNPs in genome IS03 = 1 number of discarded SNPs in genome IS03 = 0 number of discarded Reference positions in genome IS03 = 27

LOG: 2017/12/05 09:40:29 - number of callable positions in genome ECOLI = 24 LOG: 2017/12/05 09:40:29 - number of callable positions in genome IS03 = 24 LOG: 2017/12/05 09:40:29 - sequence(s) inserted into tree

LOG: 2017/12/05 09:40:29 - Insertion likelihood values: sample_name insertion_likelihood number of potential insertion nodes ECOLI 0.016790 60 IS03 0.016790 60

LOG: 2017/12/05 09:40:30 - all done

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jasonsahl/wgfast/issues/1#issuecomment-349363108, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI7sNqgHlZeBEaYKWE4as-1cIpyzoJrHks5s9XIpgaJpZM4QhrWK.

jasonsahl commented 6 years ago

Caroline,

Yes, looks good. Thanks also for the feedback on RAxML. I'll test with newer versions and see if anything has changed that has also broken WG-FAST. Please open a new issue if you see anything odd.