simonhmartin / genomics_general

General tools for genomic analyses.
343 stars 93 forks source link

raxml_sliding_windows.py::Unable to identify outgroups #91

Closed cheninouc closed 1 year ago

cheninouc commented 1 year ago

Hi @simonhmartin

I am trying to run the script raxml_sliding_windows.py in python 2.7,This is the command I ran on a server: python /phylo/raxml_sliding_windows.py -g ./beagle_parse.geno.gz --prefix raxml.w10kb -w 100000 -M 10 --windType coordinate --model GTRCATI --raxml /public/home/chcg/anaconda3/envs/py27/bin/raxmlHPC --outgroup SRR14865027,SRR14865028,SRR14865030 -T 16 --test --log /log But I always get the following error in log: Error, the outgroup name "SRR14865027" you specified can not be found in the alignment, exiting ....

I further discovered that the error was made in this step. raxmlHPC -s /HiC_scaffold_1_500001_600000_mJOMbO.phy -n HiC_scaffold_1_500001_600000_mJOMbO.phy -m GTRCATI -o SRR14865027,SRR14865028,SRR14865030 -V -f d -p 12345 --silent >>/log

they will be split into haplotypes, and the suffixes '_A' and '_B' will be added to the sample names to distinguish the haplotypes. As a result, raxmlHPC can not recognize the outgroup samples.

I'm deeply troubled by this problem.Could you please help me ? Thanks a lot Best wishes, chcg

simonhmartin commented 1 year ago

Hi chcg, I think the fix is to simply add the "_A" and "_B" suffixes in your command. In other words, specify all six haplotype names for the three outgroup individuals. Does that fix it?

cheninouc commented 1 year ago

Hi Simon. Adding the "_A" and "_B" suffixes in my command solves the problem . Thank you.