szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
109 stars 33 forks source link

about the xpehh output file #32

Closed fjptwenger closed 5 years ago

fjptwenger commented 5 years ago

I have a question about the xpehh method , what causes all the result file to be empty?

the command: selscan --xpehh --tped LW.xpehh.1.tped --tped-ref AWB.xpehh.1.tped --out xpehh

selscan v1.2.0a
Opening LW.xpehh.1.tped...
Loading 40 haplotypes and 1147166 loci...
Opening AWB.xpehh.1.tped...
Loading 42 haplotypes and 1147166 loci...
Opening LW.xpehh.1.tped...
Loading map data for 1147166 loci
Starting XP-EHH calculations.
|==============================================================================|
Finished.

the file: $ head LW.xpehh.1.tped 1 . 0 9875 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 0 10048 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 0 67556 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 . 0 95330 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 0 1 . 0 121258 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 0 0 1 . 0 353207 0 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 . 0 743023 0 0 1 1 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 0 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 . 0 753016 1 1 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 1 0 1 0 1 1 1 0 1 0 1 0 0 0 1 1 . 0 841793 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 . 0 881051 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1

$ head AWB.xpehh.1.tped 1 . 0 9875 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 . 0 10048 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 . 0 67556 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 0 95330 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 0 121258 0 0 1 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . 0 353207 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 1 1 1 1 . 0 743023 0 0 1 1 1 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 . 0 753016 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 0 1 1 1 . 0 841793 1 1 0 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 1 1 . 0 881051 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Do I need to provide additional information?

fjptwenger commented 5 years ago

I found the reason,I just ignored the log file. the message WARNING: Reached chromosome edge before EHH decayed below 0.05. Skipping calculation at . ,maybe i need change the cutoff?But I don't know what effect this will have.

what is the main reason for generating empty files??

szpiech commented 5 years ago

The XPEHH statistic is calculated by computing, in each population, EHH at successively wider windows around a locus untill EHH falls below the cutoff (specified by --cutoff), then this curve is integrated, and XPEHH is given as the ratio of the scores from the two populations. If you have a locus that is close to the end of a chromosome, it is likely that EHH may not reach the cutoff before the window hits the end of the data, therefore selscan chooses to not report any scores for that locus by default. It is possible that all loci fall into this category, and therefore nothing is reported. This happens most frequently when one only analyses a small region (e.g. this crops up when analyzing simulated data). You can, of course, raise the cutoff, which may affect your power to detect sweeps. You could also use the ---trunc-ok flag to simply report the score anyway, even if the EHH cutoff wasn't achieved.

You will also need to provide genetic map information or at least copy your physical positions into the genetic map column.

Hope this helps!

fjptwenger commented 5 years ago

thank you for your reply,but when i add ‘--trunc-ok --map auto.map ’ and set ‘--cutoff 0.5’ ,the output still empty。

map file:

 $ head auto.map 
1   rs1111334233    0   1082
1   rs1111022217    0   1268
1   rs1111770947    0   1703
1   rs1110650834    0   1946
1   rs1113275186    0   1990
1   rs1109262880    0   3918
1   rs710320748 0   3966
1   rs81308531  0   3982
1   rs710095632 0   4553
1   rs700294049 0   4606

so if it’s not my usage error, it means that my data has no selection region?

szpiech commented 5 years ago

You need to give something other than zeros in your third column. Physical distance would work if you don't have genetic distances.