Weird false positive XP-nSL signals appeared with larger samples

Ceonham commented 9 months ago

Hi Prof. Szpiech, I recently used selscan to calculate XP-nSL between a target population (209 samples) and three reference populations (ref1: 38 samples, ref2: 70 samples, ref3: 210 samples) separately. I used the following commands to calculate and normalize XP-nSL:

selscan --xpnsl --vcf-ref ref_$chr.vcf --vcf target_$chr.vcf --threads 3 --out ref_target_$chr
norm --xpnsl --files ref_target_${chr}.xpnsl.out --bp-win

Finally I got satisfying 1% significant signals from ref1 vs target ref1_target.xpnsl.out.norm.100kb.windows.txt ref1_target.0.01.xpnsl.txt and ref2 vs target. ref2_target.xpnsl.out.norm.100kb.windows.txt ref2_target.0.01.xpnsl.txt However, a lot of false positive signals appeared in the results from ref3 vs target. ref3_target.xpnsl.out.norm.100kb.windows.txt ref3_target.0.01.xpnsl.txt

Do you have any suggestions to solve this problem? Please check the attached six result files. I guess some parameters would be adjusted during normalization of ref3. Looking forward to your reply!

Thank you very much!

szpiech commented 9 months ago

Hello,

It looks like when you add higher sample size reference population (ref3), you seem to be getting a lot of "hits" in windows with very few scores in them and who have a "proportion of extreme scores" of 0. My guess is that in these low-score-density windows the distribution of proportions of extreme scores is actually all 0s, and norm is naively calculating the top 1% threshold as 0, and therefore you're getting a ton of calls at these low density windows. This is something that should be handled better within norm, but for you right now I would exclude windows with very few scores from the --bp-win analysis. The default was set at 10, but this was based on a paper from 2006 that used genotyping data and not (as I think you may have here) whole genome sequences. You will want to increase this, I imagine. Hope this helps.

Zachary

On Fri, Dec 15, 2023 at 3:52 AM Ham Ceon @.***> wrote:

Hi Prof. Szpiech, I recently used selscan to calculate XP-nSL between a target population (209 samples) and three reference populations (ref1: 38 samples, ref2: 70 samples, ref3: 210 samples) separately. I used the following commands to calculate and normalize XP-nSL:

selscan --xpnsl --vcf-ref ref$chr.vcf --vcf target$chr.vcf --threads 3 --out reftarget$chr norm --xpnsl --files conventionalgwas${chr}.xpnsl.out --bp-win

Finally I got satisfying 1% significant signals from ref1 vs target ref1_target.xpnsl.out.norm.100kb.windows.txt https://github.com/szpiech/selscan/files/13683045/ref1_target.xpnsl.out.norm.100kb.windows.txt ref1_target.0.01.xpnsl.txt https://github.com/szpiech/selscan/files/13683074/ref1_target.0.01.xpnsl.txt and ref2 vs target. ref2_target.xpnsl.out.norm.100kb.windows.txt https://github.com/szpiech/selscan/files/13683087/ref2_target.xpnsl.out.norm.100kb.windows.txt ref2_target.0.01.xpnsl.txt https://github.com/szpiech/selscan/files/13683092/ref2_target.0.01.xpnsl.txt However, a lot of false positive signals appeared in the results from ref3 vs target. ref3_target.xpnsl.out.norm.100kb.windows.txt https://github.com/szpiech/selscan/files/13683102/ref3_target.xpnsl.out.norm.100kb.windows.txt ref3_target.0.01.xpnsl.txt https://github.com/szpiech/selscan/files/13683136/ref3_target.0.01.xpnsl.txt

Do you have any suggestions to solve this problem? Please check the attached six result files. I guess some parameters would be adjusted during normalization of ref3. Looking forward to your reply!

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/106, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQVIGG3NKO7RYDAFAGTYJQFU7AVCNFSM6AAAAABAWDRJLGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DGMJZG4YTGOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Ceonham commented 9 months ago

Thanks so much!!! This problem has been perfectly solved according to your suggestion.

szpiech / selscan

Weird false positive XP-nSL signals appeared with larger samples #106