szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
109 stars 33 forks source link

Error when using physical map sorted based on physical position and ID #26

Closed vguerracanedo closed 4 years ago

vguerracanedo commented 6 years ago

I'm running into a silly problem. My physical map file has repetitive physical positions with unique IDs. The data is sorted based on and then in . Example at the end of the message.

When I try to use iHS, I get the following problem: ERROR: Variant physical position must be strictly increasing. rs201044430 216605 comes after rs112068709 216605 My data is already sorted so that 'rs201044430 216605' comes after 'rs112068709 216605'. So I'm not sure what to do differently.

Best, Vanessa


Sample file

7 rs28527214 216426 216426 7 rs66644650 216512 216512 7 rs148463803 216515 216515 7 rs28485819 216569 216569 7 rs28498692 216570 216570 7 rs112068709 216605 216605 7 rs201044430 216605 216605 7 rs188651719 216660 216660 7 rs193275413 216662 216662 7 rs137869704 216672 216672 7 rs139968177 216735 216735

szpiech commented 6 years ago

Hi Vanessa,

Sorry for the delay in getting back to you. At the moment selscan can only handle biallelic variants, and so when multiple variants are reported at the same physical position it will throw this error, although I can see how it can be a confusing message. I think your best bet would be to filter these two sites from your dataset. I hope this helps!

-Zach

vguerracanedo commented 6 years ago

Hi Zach, Thank you for responding. As you noted, removing the entries with repeated physical positions did the trick. Best, V

vs4223 commented 6 years ago

Hi Zach,

My problem is along the same lines as Vanessa's so I am posting in the same thread.

I am trying to use selscan to calculate EHH scores. I have 1000 genomes vcf files which I have used to produce map files with vcftools, such that they look like this:

22 rs8142737 50291889 50291889
22 rs570182536 50291936 50291936
22 rs8135816 50291976 50291976
22 rs8140681 50292081 50292081
22 rs9627785 50292178 50292178
22 rs9616779 50292545 50292545
22 rs9616780 50292763 50292763
22 rs139397353 50292931 50292931
22 rs9616364 50292983 50292983
22 rs12159367 50293281 50293281
22 rs7290342 50294176 50294176
22 rs141187212 50294325 50294325
22 rs6520063 50294378 50294378
22 rs6520064 50294469 50294469

My selscan command is:

selscan --ehh 50292931 --vcf chr22.vcf.gz --map plink22.map --maf 0.0001 --out test.txt

This produces the following error:

ERROR: Variant physical position must be strictly increasing.
    -- -9999 comes after    -- -9999

Now I have tried to identify the problem row by trying to grep for "-9999" but get nothing. I have also tried to sort on the physical position column but get the same error. There are no blank rows at the start or end of the file.

To ensure there was no issue with my map file, I tried using different chromosomes but keep getting this error.

By the way, I have also tried using hapbin with the same files using the following command:

ehhbin --locus 50292931 --hap out.impute.hap --map <(awk '{$3=$4;print}' plink22.map)

But I always get an error:

no locus with the id: 50292931

I have checked and the locus is definitely within the .map and the .hap files (which were created from the vcf files). Therefore I think the problem must be within my map files but I cannot fathom what the issue is.

szpiech commented 6 years ago

So my first thought is that you should request the site by rsid and not position. Please try selscan --ehh rs139397353 --vcf chr22.vcf.gz --map plink22.map --maf 0.0001 --out test.txt and see if that works. Admittedly that doesn't seem to be a terribly useful error message that you got. I'll have to make it more informative. Please let me know if this, at least, solves your problem.

vs4223 commented 6 years ago

Hi Zach,

Many thanks for getting back to me. Unfortunately this does not solve the issue. I still get the exact same error. Also would using IDs not prove an issue for de novo variants that have not been assigned an ID?

szpiech commented 6 years ago

Sorry for the delay in getting back to you.

Yes, I think that I should modify the lookup scheme to allow for rsid or genomic position. I typically assign variants without an rsid a temporary id based on the chromosome and position, but I forget this isn't what everyone does.

Are you using a publicly accessible vcf file? I would like to try to reproduce this problem.

szpiech commented 4 years ago

Physical map duplicated locations are now allowed, and statistics that are integrated over a map can directly use physical positions with --pmap.

TimothyCiesielski commented 1 year ago

Hi Zach,

I am new to Selscan and I am having a similar issue. I have been able to get nSL output for one chromosome but when I attempt to run whole genomes, I get this error:

ERROR: Variant physical position must be monotonically increasing. 2:10610:G:A 10610 appears after 1:248945650:C:G 248945650

example code: selscan --nsl --vcf nameofVCFfile.vcf --out selscannSLresults

It looks like --pmap is not available for nSL . . . any thoughts?

Thanks in advance for your help (and for making Selscan user friendly), Tim

😃

szpiech commented 1 year ago

Hi Tim,

You’ll have to separate your files by chromosome and run each separately. You can then normalize them together with norm. Let me know if you have more issues.

Zachary

Le mer. 14 juin 2023 à 5:42 PM, TimothyCiesielski @.***> a écrit :

Hi Zach,

I am new to Selscan and I am having a similar issue. I have been able to get nSL output for one chromosome but when I attempt to run whole genomes, I get this error:

ERROR: Variant physical position must be monotonically increasing. 2:10610:G:A 10610 appears after 1:248945650:C:G 248945650

example code: selscan --nsl --vcf nameofVCFfile.vcf --out selscannSLresults

It looks like --pmap is not available for nSL . . . any thoughts?

Thanks in advance for your help (and for making Selscan user friendly), Tim

😃

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/26#issuecomment-1592027415, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSZ7B43GMT5TLGUK3TXLIV5DANCNFSM4ECUKICQ . You are receiving this because you modified the open/close state.Message ID: @.***>

TimothyCiesielski commented 1 year ago

Thanks Zach - I appreciate the help on this. Tim

malteze2024 commented 5 months ago

Hello! Please tell me how to solve the problem with the sheep genome map file. If the map file is sorted by genetic position, the program generates a physical position error and vice versa. Of the 26 chromosomes, only 12 are processed without errors. The initial map file was generated through GenomeStudio.

for reg in $(seq 1 26) ; do selscan --xpehh --vcf phasedRMMrenchr$reg.vcf.gz --vcf-ref phasedDMrenchr$reg.vcf.gz --map MAP_sorted$reg.map --threads 12 --out 2xpEhhcheap$reg; done selscan v2.0.0 Opening phasedRMMrenchr1.vcf.gz... Loading 108 haplotypes and 61259 loci... Opening phasedDMrenchr1.vcf.gz... Loading 106 haplotypes and 61259 loci... Opening MAP_sorted1.map... Loading map data for 61259 loci ERROR: Variant genetic position must be monotonically increasing. oar3_OAR1_101700644 122.639 appears after oar3_OAR1_101688882 122.64

for reg in $(seq 1 26) ; do selscan --xpehh --vcf phasedRMMrenchr$reg.vcf.gz --vcf-ref phasedDMrenchr$reg.vcf.gz --map 2MAP_sorted$reg.map --threads 12 --out 2xpEhhcheap$reg; done selscan v2.0.0 Opening phasedRMMrenchr1.vcf.gz... Loading 108 haplotypes and 61259 loci... Opening phasedDMrenchr1.vcf.gz... Loading 106 haplotypes and 61259 loci... Opening MAP_sorted1.map... Loading map data for 61259 loci ERROR: Variant physical position must be monotonically increasing. OAR19_64803054.1 204694 appears after DU281551_498.1 315497

With best regards, Lesya

szpiech commented 5 months ago

Hello,

From my perspective, the important question is why are there sites with map positions out of order relative to the physical positions. In principle this should be impossible, so I would investigate why this seemed to happen. I could see this possibly resulting from a liftover of a genetic map between genome builds, for example.

However, the simplest solution would be to drop one of the two offending sites from your data. If one of the two sites has low MAF, you might as well drop that one, as selscan would filter it out anyway.

Hope this helps,

Zachary

On Tue, Mar 26, 2024 at 3:10 PM malteze2024 @.***> wrote:

Hello! Please tell me how to solve the problem with the sheep genome map file. If the map file is sorted by genetic position, the program generates a physical position error and vice versa. Of the 26 chromosomes, only 12 are processed without errors. The initial map file was generated through GenomeStudio.

for reg in $(seq 1 26) ; do selscan --xpehh --vcf phasedRMMrenchr$reg.vcf.gz --vcf-ref phasedDMrenchr$reg.vcf.gz --map MAP_sorted$reg.map --threads 12 --out 2xpEhhcheap$reg; done selscan v2.0.0 Opening phasedRMMrenchr1.vcf.gz... Loading 108 haplotypes and 61259 loci... Opening phasedDMrenchr1.vcf.gz... Loading 106 haplotypes and 61259 loci... Opening MAP_sorted1.map... Loading map data for 61259 loci ERROR: Variant genetic position must be monotonically increasing. oar3_OAR1_101700644 122.639 appears after oar3_OAR1_101688882 122.64

for reg in $(seq 1 26) ; do selscan --xpehh --vcf phasedRMMrenchr$reg.vcf.gz --vcf-ref phasedDMrenchr$reg.vcf.gz --map 2MAP_sorted$reg.map --threads 12 --out 2xpEhhcheap$reg; done selscan v2.0.0 Opening phasedRMMrenchr1.vcf.gz... Loading 108 haplotypes and 61259 loci... Opening phasedDMrenchr1.vcf.gz... Loading 106 haplotypes and 61259 loci... Opening MAP_sorted1.map... Loading map data for 61259 loci ERROR: Variant physical position must be monotonically increasing. OAR19_64803054.1 204694 appears after DU281551_498.1 315497

With best regards, Lesya

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/26#issuecomment-2021270633, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQQGITP7HJ7P27QDSKTY2HB2DAVCNFSM4ECUKIC2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBSGEZDOMBWGMZQ . You are receiving this because you modified the open/close state.Message ID: @.***>

malteze2024 commented 4 months ago

Thank you for such a quick response! Apparently, I will still have to use the --pmap option. Does using a physical map have a big impact on my results? There are 600k SNP in my file.

szpiech commented 4 months ago

Hello,

Well, generally they should be comparable, although you may find slightly more extreme scores in regions of low recombination. You could also choose to use xp-nsl, which doesn't use either distance, although it may still have similar properties. On the whole, I don't think it is too much of a concern.

Zachary

On Wed, Mar 27, 2024 at 4:22 AM malteze2024 @.***> wrote:

Thank you for such a quick response! Apparently, I will still have to use the --pmap option. Does using a physical map have a big impact on my results? There are 600k SNP in my file.

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/26#issuecomment-2022187746, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQSMWQCQNFXTPDYFYITY2J6TFAVCNFSM4ECUKIC2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBSGIYTQNZXGQ3A . You are receiving this because you modified the open/close state.Message ID: @.***>

drsancho commented 4 months ago

hello sir

i m working on cow genome and need to run selscan for ihh12. I have phased the the 29 chromosomes into a single vcf file. I am running the command selscan --ihh12 --vcf xyzz.vcf --map abc.map --out final

the error it is showing is variant physical position must be monotonically increasing. i am just starting my studies in bioinformatics. can you guide me how to navigate through it.

thank you

sanchit

drsancho commented 4 months ago

selscan problem

i have tried sorting also, it gives the similar error using command sort -nk 4 xyz.map > xyz1.map

szpiech commented 4 months ago

Hello,

You will need to split your vcfs by chromosome in order to run it through selscan.

-Zachary

On Mon, Apr 22, 2024 at 2:40 AM drsancho @.***> wrote:

selscan.problem.jpg (view on web) https://github.com/szpiech/selscan/assets/167742045/ebdd5f39-62b2-4cf4-9765-e199c37ceece

i have tried sorting also, it gives the similar error using command sort -nk 4 xyz.map > xyz1.map

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/26#issuecomment-2068602245, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQVCJGR5K7HZ6GLDAWTY6SWHPAVCNFSM4ECUKIC2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBWHA3DAMRSGQ2Q . You are receiving this because you modified the open/close state.Message ID: @.***>

drsancho commented 4 months ago

okay sir

drsancho commented 4 months ago

it is showing the same error that is variant genetic position should be monotonically increasing. can you please help me further?

szpiech commented 4 months ago

Hello,

This error means that your positions are out of order in your file. You need to either put them in order or remove the sites that are out of order.

-Zachary

On Wed, Apr 24, 2024 at 12:59 AM drsancho @.***> wrote:

it is showing the same error that is variant genetic position should be monotonically increasing. can you please help me further?

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/26#issuecomment-2074030244, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQVTRSPIOBD2ZVPOZOLY6432HAVCNFSM4ECUKIC2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBXGQYDGMBSGQ2A . You are receiving this because you modified the open/close state.Message ID: @.***>