simonhmartin / genomics_general

General tools for genomic analyses.
342 stars 93 forks source link

problem with phyml_sliding_windows.py #103

Closed InfiniteLaugh closed 8 months ago

InfiniteLaugh commented 1 year ago

Hi Simon, I was trying to use phyml_sliding_windows.py, and use its result as the input of TWISS. But I found there were some issues on some occasions. After using Beagle and VCF_processing/parseVCF.py, I got the file like this, which seems to be alright. image

but when I try to run phyml_sliding_windows.py, using following code python /home/location_to_software/genomics_general-master/phylo/phyml_sliding_windows.py -T 40 -g geno.beagle.vcf.gz --prefix geno.beagle.phyml_bionj.w10k -w 10000 --windType sites --model GTR --optimise n --outgroup ../outgroup.txt

It might got this stderr, with empty result, and its still processing: image

But on some occasions, I can get output for further analysis, with the same code and the input file, which confused me. And with the same input file in different machine, the window stops at different places. For example, in the above picture, it was 40, but in another machine, it stopped at 20.

Can you give me some suggestions to solve this kind of problem? Or is there any problem with the procedure. Yours, Jethro

simonhmartin commented 1 year ago

Hi Jethro, I suspect that there may be a genotype somewhere in the geno file that is causing problems. What I recommend is to try making a very small input file of just a few hundred lines and running on 1 thread to see if you get the same issue. If not, try extending the number of lines you include until you get the error. Hopefully this will locate the issue. Best wishes, Simon

InfiniteLaugh commented 1 year ago

Hi Jethro, I suspect that there may be a genotype somewhere in the geno file that is causing problems. What I recommend is to try making a very small input file of just a few hundred lines and running on 1 thread to see if you get the same issue. If not, try extending the number of lines you include until you get the error. Hopefully this will locate the issue. Best wishes, Simon

Hello Simon,

Thank you for your reply! I have tried different scripts and it seems the following script will solve my problem. python /home/location_to_software/genomics_general-master/phylo/phyml_sliding_windows.py -T 20 -g geno.beagle.vcf.gz --prefix --prefix geno.beagle.phyml_bionj.w10k -w 10000 --windType coordinate --model GTR -M 50

However, as I used TWISS to quantify the frequency of alternative phylogenetic topologies in sliding windows along the genome, it happens to be that among those possible species topologies, the most common topology exhibited different topology with NJ tree constructed by 4DTV SNPs. The reason why I used 4DTV SNPs to construct NJ trees is that only 4DTV NJ tree was as same as the genome phylogenetic tree among those species. For the whole genome SNPs will get an inconsistent result. This kind of problem also happens when I try to use Treemix software, So if it is possible, could you give me some advice?

Sincerely, Jethro

simonhmartin commented 1 year ago

Dear Jethro, It is mathematically possible that the most common topology using twisst for windows along the genome is not the same as the genome-wide best topology. So I'm not sure there is a problem here? Simon

InfiniteLaugh commented 1 year ago

Dear Jethro, It is mathematically possible that the most common topology using twisst for windows along the genome is not the same as the genome-wide best topology. So I'm not sure there is a problem here? Simon

Hello Simon,

Thanks for your reply! Yeah, I agree with your idea that the genome-wide best topology is just a mathematical result, so it comes to me that if I specify the outgroups simonhmartin/twisst/issues/#44 https://github.com/simonhmartin/twisst/issues/44 like your answer in this issue, will the result be different? Or will the result be different, if I use change a model like, RaxML model?

Sincerely, Jethro

simonhmartin commented 9 months ago

Hi Jethro, I'm sorry I never got back to you on this. I don't really understand your question. Different tree inference methods give different results, but specifying an outgroup in TWISST does not change the result, it only changes the way the trees are represented (rooted vs unrooted).

InfiniteLaugh commented 8 months ago

Hi Jethro, I'm sorry I never got back to you on this. I don't really understand your question. Different tree inference methods give different results, but specifying an outgroup in TWISST does not change the result, it only changes the way the trees are represented (rooted vs unrooted).

OK, I understood, thank you for your answer.