szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
109 stars 33 forks source link

xpehh warning #67

Closed Wennie-s closed 2 years ago

Wennie-s commented 2 years ago

Hi, when I run xpehh, I found many warning in my log file, and no result output. My genome is very large (10 Gb), and I have split my chromosome into 24 (each new chromosome about 500Mb). My run command is :/data/user003/soft/.conda/envs/selscan/bin/selscan --xpehh --vcf ./Chr10.phased_B5.vcf.gz --vcf-ref ./Chr10.CuduS.phased_B5.vcf.gz --map Chr10.map --threads 6 --out ./Chr10. and the warning information is: image Can you help me?

szpiech commented 2 years ago

Hi there,

If the genetic map file isn't sufficiently fine-scale (i.e. too few sig figs) it is possible that sites could be skipped (see https://github.com/szpiech/selscan/issues/64#issuecomment-916248151), so this could account for some of the missing scores.

The warnings you see are of two types. First, sites close to the ends of the datafile (often chromosome ends) are skipped because selscan isn't able to compute haplotype homozygosity in a wide enough interval. If you wish to include these sites, you can use the --trunc-ok flag, but note that these scores haven't been "fully calculated" as more central sites will have been (the integral is truncated near these boundaries due to lack of data). If you are splitting up otherwise contiguous stretches of chromosome for computational efficiency, I recommend including an extra 1Mbp on these boundaries, so that each file has some overlap with the adjacent ones (so that you can get a "fully calculated" score). Of course this will require some post-processing to avoid duplicated scores in the merged results.

The other type of warning you are getting is related to a large inter-snp gaps. By default selscan will not report a score if the integral spans a large gap. You can change this gap parameter with --max-gap.

Hope this helps, let me know if you run into any problems.

Wennie-s commented 2 years ago

Thank you very much for your reply. I have solve it. However, when I run norm --xpehh --files Chr2.xpehh.out --bp-win --winsize 100000. I can't understand the meaning of the output. image There are only 5 columns. But the manual.pdf indicated that 9 columns should be involved. I don't know why?

szpiech commented 2 years ago

Are you using the most updated version of norm?