Closed InfiniteLaugh closed 8 months ago
Hi Jethro, I suspect that there may be a genotype somewhere in the geno file that is causing problems. What I recommend is to try making a very small input file of just a few hundred lines and running on 1 thread to see if you get the same issue. If not, try extending the number of lines you include until you get the error. Hopefully this will locate the issue. Best wishes, Simon
Hi Jethro, I suspect that there may be a genotype somewhere in the geno file that is causing problems. What I recommend is to try making a very small input file of just a few hundred lines and running on 1 thread to see if you get the same issue. If not, try extending the number of lines you include until you get the error. Hopefully this will locate the issue. Best wishes, Simon
Hello Simon,
Thank you for your reply!
I have tried different scripts and it seems the following script will solve my problem.
python /home/location_to_software/genomics_general-master/phylo/phyml_sliding_windows.py -T 20 -g geno.beagle.vcf.gz --prefix --prefix geno.beagle.phyml_bionj.w10k -w 10000 --windType coordinate --model GTR -M 50
However, as I used TWISS to quantify the frequency of alternative phylogenetic topologies in sliding windows along the genome, it happens to be that among those possible species topologies, the most common topology exhibited different topology with NJ tree constructed by 4DTV SNPs. The reason why I used 4DTV SNPs to construct NJ trees is that only 4DTV NJ tree was as same as the genome phylogenetic tree among those species. For the whole genome SNPs will get an inconsistent result. This kind of problem also happens when I try to use Treemix software, So if it is possible, could you give me some advice?
Sincerely, Jethro
Dear Jethro, It is mathematically possible that the most common topology using twisst for windows along the genome is not the same as the genome-wide best topology. So I'm not sure there is a problem here? Simon
Dear Jethro, It is mathematically possible that the most common topology using twisst for windows along the genome is not the same as the genome-wide best topology. So I'm not sure there is a problem here? Simon
Hello Simon,
Thanks for your reply! Yeah, I agree with your idea that the genome-wide best topology is just a mathematical result, so it comes to me that if I specify the outgroups simonhmartin/twisst/issues/#44 https://github.com/simonhmartin/twisst/issues/44 like your answer in this issue, will the result be different? Or will the result be different, if I use change a model like, RaxML model?
Sincerely, Jethro
Hi Jethro, I'm sorry I never got back to you on this. I don't really understand your question. Different tree inference methods give different results, but specifying an outgroup in TWISST does not change the result, it only changes the way the trees are represented (rooted vs unrooted).
Hi Jethro, I'm sorry I never got back to you on this. I don't really understand your question. Different tree inference methods give different results, but specifying an outgroup in TWISST does not change the result, it only changes the way the trees are represented (rooted vs unrooted).
OK, I understood, thank you for your answer.
Hi Simon, I was trying to use phyml_sliding_windows.py, and use its result as the input of TWISS. But I found there were some issues on some occasions. After using Beagle and VCF_processing/parseVCF.py, I got the file like this, which seems to be alright.
but when I try to run phyml_sliding_windows.py, using following code
python /home/location_to_software/genomics_general-master/phylo/phyml_sliding_windows.py -T 40 -g geno.beagle.vcf.gz --prefix geno.beagle.phyml_bionj.w10k -w 10000 --windType sites --model GTR --optimise n --outgroup ../outgroup.txt
It might got this stderr, with empty result, and its still processing:
But on some occasions, I can get output for further analysis, with the same code and the input file, which confused me. And with the same input file in different machine, the window stops at different places. For example, in the above picture, it was 40, but in another machine, it stopped at 20.
Can you give me some suggestions to solve this kind of problem? Or is there any problem with the procedure. Yours, Jethro