Closed CarolineOhrman closed 6 years ago
Caroline, Yes, you need to provide the script with a tree. I'm surprised that the script ran at all without providing a tree. If you need a tree, you can generate this with the wgfast_prep.py script provided with the repository.
I realized that the tree should be in the reference folder so that problem is solved but I still have problem with raxML.
First I used my own nasp.tree and bestsnp.tsv but to rule out that it was my files that was causing my problem I have now used your files from https://github.com/jasonsahl/MTS_databases.
It seems that raxML isnt able to run correctly. I get this message in the out.raxml.out file
Error you want to place query sequences into a tree using (null), but you have provided an input tree that already contains all taxa
Should i use a specific version of raxmll (im using 8.2.10) or do you have any idea why i get this error?
Kind regards, Caroline
Caroline, Perhaps the reads aren't being processed properly. Let me look into it. My hope is to push changes with updated documentation early next week. I'll test everything thoroughly before I push those tests. Sorry about the troubles. Jason
Caroline,
Could you please update the repository and try again?
thanks, Jason
I've updated and tried again and I still have the same error printed in the out.raxml.out file. This is the last lines in that file.
Alignment has 926 distinct alignment patterns
Proportion of gaps and completely undetermined characters in this alignment: 0.00%
RAxML likelihood-based placement algorithm
Using 1 distinct models/data partitions with joint branch length optimization
All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter
Partition: 0 Alignment Patterns: 926 Name: No Name Provided DataType: DNA Substitution Matrix: GTR Correcting likelihood for ascertainment bias
RAxML was called as follows:
raxmlHPC-SSE3 -f V -s out.fasta -m ASC_GTRGAMMA -n out -t /home/caroline/MTS_databases/Francisella_tularensis/nasp_raxml.tree --asc-corr=lewis --no-bfgs > /dev/null 2>&1
I get the same erors with the nasp and wgfast_prep files I prepared my self and with those in your MTS_database (cant use the nasp.PARAMS file because of non matching binary file versions).
Caroline
Caroline,
Can you share your “out.fasta” file? If you can send the entire output that WG-FAST printed to screen, that would also be helpful.
thanks, Jason
On Dec 4, 2017, at 12:59 AM, Caroline Öhrman notifications@github.com wrote:
I've updated and tried again and I still have the same error printed in the out.raxml.out file. This is the last lines in that file.
Alignment has 926 distinct alignment patterns
Proportion of gaps and completely undetermined characters in this alignment: 0.00%
RAxML likelihood-based placement algorithm
Using 1 distinct models/data partitions with joint branch length optimization
All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter
Partition: 0 Alignment Patterns: 926 Name: No Name Provided DataType: DNA Substitution Matrix: GTR Correcting likelihood for ascertainment bias
RAxML was called as follows:
raxmlHPC-SSE3 -f V -s out.fasta -m ASC_GTRGAMMA -n out -t /home/caroline/MTS_databases/Francisella_tularensis/nasp_raxml.tree --asc-corr=lewis --no-bfgs > /dev/null 2>&1
Error you want to place query sequences into a tree using (null), but you have provided an input tree that already contains all taxa (END)
I get the same erors with the nasp and wgfast_prep files I prepared my self and with those in your MTS_database (cant use the nasp.PARAMS file because of non matching binary file versions).
Caroline
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jasonsahl/wgfast/issues/1#issuecomment-348886568, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEmgIZ0RMxOBUH8y1oWtDq82bJPqJrhks5s86ZdgaJpZM4QhrWK.
This is the output. I do not have any file called RAxML_labelledTree* in my output folder since (what I believe at least) raxml failed to run. Thank you! Caroline
python /home/carodl/bin/wgfast/wgfast.py -r ../MTS_databases/Francisella_tularensis/ -d ../wgfast_input/tularensis/ LOG: 2017/12/04 08:44:51 - testing the paths of all dependencies /home/carodl/bin/wgfast/standard-RAxML/raxmlHPC-SSE3 citation: 'Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics (2014).' citation: 'Berger SA, Krompass D, Stamatakis A. Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood. Syst Biol. 2011;60(3):291-302' /home/carodl/bin/miniconda3/bin/samtools citation: 'Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078-9' /usr/bin/bwa citation: 'Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXivorg. 2013(arXiv:1303.3997 [q-bio.GN])' Patristic distances calculated with DendroPy citation: 'Sukumaran J, Holder MT. DendroPy: a Python library for phylogenetic computing. Bioinformatics. 2010;26(12):1569-71. Epub 2010/04/28. doi: 10.1093/bioinformatics/btq228. PubMed PMID: 20421198' Also uses GATK for variant calling citation: 'McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research. 2010;20(9):1297-303' Uses trimmomatic for read trimming citation: Bolger A.M., Lohse M., Usadel B. Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics. 2014. Doi:10.1093/bioinformatics/btu170 Uses BioPython for FASTA parsing citation :Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25(11):1422-3
LOG: 2017/12/04 08:44:53 - sequence(s) inserted into tree
sed: can't read RAxML_labelledTree.out: No such file or directory
mv: cannot stat 'RAxML_labelledTree.out': No such file or directory
Traceback (most recent call last):
File "/home/carodl/bin/wgfast/wgfast.py", line 351, in
Från: Jason Sahl [mailto:notifications@github.com] Skickat: den 4 december 2017 22:27 Till: jasonsahl/wgfast wgfast@noreply.github.com Kopia: Caroline Öhrman caroline.ohrman@foi.se; Author author@noreply.github.com Ämne: Re: [jasonsahl/wgfast] Cannot find RAxML_labelledTree.out (#1)
Caroline,
Can you share your “out.fasta” file? If you can send the entire output that WG-FAST printed to screen, that would also be helpful.
thanks, Jason
On Dec 4, 2017, at 12:59 AM, Caroline Öhrman notifications@github.com wrote:
I've updated and tried again and I still have the same error printed in the out.raxml.out file. This is the last lines in that file.
Alignment has 926 distinct alignment patterns
Proportion of gaps and completely undetermined characters in this alignment: 0.00%
RAxML likelihood-based placement algorithm
Using 1 distinct models/data partitions with joint branch length optimization
All free model parameters will be estimated by RAxML GAMMA model of rate heteorgeneity, ML estimate of alpha-parameter
Partition: 0 Alignment Patterns: 926 Name: No Name Provided DataType: DNA Substitution Matrix: GTR Correcting likelihood for ascertainment bias
RAxML was called as follows:
raxmlHPC-SSE3 -f V -s out.fasta -m ASC_GTRGAMMA -n out -t /home/caroline/MTS_databases/Francisella_tularensis/nasp_raxml.tree --asc-corr=lewis --no-bfgs > /dev/null 2>&1
Error you want to place query sequences into a tree using (null), but you have provided an input tree that already contains all taxa (END)
I get the same erors with the nasp and wgfast_prep files I prepared my self and with those in your MTS_database (cant use the nasp.PARAMS file because of non matching binary file versions).
Caroline
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jasonsahl/wgfast/issues/1#issuecomment-348886568, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEmgIZ0RMxOBUH8y1oWtDq82bJPqJrhks5s86ZdgaJpZM4QhrWK.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jasonsahl/wgfast/issues/1#issuecomment-349111348, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI7sNmzYjLmjKnrMdEIbrZMWgDNwVCv3ks5s9GPAgaJpZM4QhrWK.
Thanks, Caroline. Your samples don't seem to be processed. Can you give me an idea of what your files are called? My guess is that your reads aren't being analyzed and I don't have a good check in place before RAxML is called. See an example of a successful log below. There should be information about each sample before RAxML is called:
LOG: 2017/12/05 09:40:29 - number of callable positions in genome ECOLI = 24 LOG: 2017/12/05 09:40:29 - number of callable positions in genome IS03 = 24 LOG: 2017/12/05 09:40:29 - sequence(s) inserted into tree
LOG: 2017/12/05 09:40:29 - Insertion likelihood values: sample_name insertion_likelihood number of potential insertion nodes ECOLI 0.016790 60 IS03 0.016790 60
LOG: 2017/12/05 09:40:30 - all done
Hi again Jason. Sorry for not replying your answer but other things came up. Now it’s a new year so new possibilities with wgfast ☺
I ran the program again and now only with the test_data provided in wgfast. I sure had some problem with the input files but first I had the same error. But then after also using the raxml version you provided it worked! In the bottom you have the output. Looks correct right?
I will go on testing with my own data and hopefully I will get that to work also. Thank you for your help!
/Caroline
Output ☺
LOG: 2018/01/04 16:18:48 - number of callable positions in genome ECOLI = 24 LOG: 2018/01/04 16:18:48 - sequence(s) inserted into tree
LOG: 2018/01/04 16:18:48 - Insertion likelihood values: sample_name insertion_likelihood number of potential insertion nodes ECOLI 0.015844 60
problem grabbing distances- make sure that subsample.distances.txt files aren't empty True distance between Reference and ECO_SE11nucmer = 0.01 Sample: ECOLI Subsample distances between Reference and subsample greater than true value = 0 Subsample distances between Reference and subsample equal to true value = 0 Subsample distances between Reference and subsample less than true value = 0 problem grabbing distances- make sure that subsample.distances.txt files aren't empty True distance between Reference and ECO_MS_107_1nucmer = 0.03 Sample: ECOLI Subsample distances between Reference and subsample greater than true value = 0 Subsample distances between Reference and subsample equal to true value = 0 Subsample distances between Reference and subsample less than true value = 0 LOG: 2018/01/04 16:18:54 - all done
Från: Jason Sahl [mailto:notifications@github.com] Skickat: den 5 december 2017 17:41 Till: jasonsahl/wgfast wgfast@noreply.github.com Kopia: Caroline Öhrman caroline.ohrman@foi.se; Author author@noreply.github.com Ämne: Re: [jasonsahl/wgfast] Cannot find RAxML_labelledTree.out (#1)
Thanks, Caroline. Your samples don't seem to be processed. Can you give me an idea of what your files are called? My guess is that your reads aren't being analyzed and I don't have a good check in place before RAxML is called. See an example of a successful log below. There should be information about each sample before RAxML is called:
LOG: 2017/12/05 09:39:08 - WG-FAST was invoked with the following parameters: -m /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp_matrix.tsv -t /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp_raxml.tree -r /Users/jasonsahl/tools/wgfast/run/wg_fast_files/reference.fasta -d ../test_data/reads/ -x /Users/jasonsahl/tools/wgfast/run/wg_fast_files/nasp.PARAMS -p 2 -c 3 -o 0.9 -k F -s F -n 100 -g T -e /tmp -z ML -f 0.1 -y F -j ASC_GTRGAMMA -i T -q EMIT_ALL_CONFIDENT_SITES number of SNPs in genome ECOLI = 1 number of discarded SNPs in genome ECOLI = 0 number of discarded Reference positions in genome ECOLI = 27 number of SNPs in genome IS03 = 1 number of discarded SNPs in genome IS03 = 0 number of discarded Reference positions in genome IS03 = 27
LOG: 2017/12/05 09:40:29 - number of callable positions in genome ECOLI = 24 LOG: 2017/12/05 09:40:29 - number of callable positions in genome IS03 = 24 LOG: 2017/12/05 09:40:29 - sequence(s) inserted into tree
LOG: 2017/12/05 09:40:29 - Insertion likelihood values: sample_name insertion_likelihood number of potential insertion nodes ECOLI 0.016790 60 IS03 0.016790 60
LOG: 2017/12/05 09:40:30 - all done
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jasonsahl/wgfast/issues/1#issuecomment-349363108, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AI7sNqgHlZeBEaYKWE4as-1cIpyzoJrHks5s9XIpgaJpZM4QhrWK.
Caroline,
Yes, looks good. Thanks also for the feedback on RAxML. I'll test with newer versions and see if anything has changed that has also broken WG-FAST. Please open a new issue if you see anything odd.
Hi Jason!
Im running wgfast with 3 inputs as required; a reference.fasta reference sequence (in the reference dir) and a bestsnps.tsv marix (from nasp also in the reference dir) and paired reads from MiSeq run (in the target dir) as input.
wgfast -r reference -d target
The program starts with the following settings
Now to the issue. I'm having problem with the raxML tree.
From what I can see is the code implemented to give a tree variable to the main function from the argument parser. In parser, no option for the tree is set resulting in a empty tree variable in the mainfunction.
tree = "".join(glob.glob(os.path.join(ref_path, "*.tree")))
So when main is calling the run_raxml in the util.py it provides no tree and no raxml.out is created and can then not be opend.
Would be happy if I could get qiuck help on this since I need your nice program during the upcoming week.
Thank you! Caroline