szpiech / selscan

Haplotype based scans for selection
GNU General Public License v3.0
109 stars 33 forks source link

EHH Segmentation fault (core dumped) #65

Open Moxsf opened 2 years ago

Moxsf commented 2 years ago

Sorry to bother you!

When I calculate EHH, the positive selection indicator with Selscan, I have some problems :

I selected some testing positions in the human genome data, but some positions can't be calculated with a segmentation fault.

For example:

selscan --ehh 11_67205462_C_A --map ~/1000G_cal/pre_1000G/map/AFR.chr11.phase3_shapeit2_mvncall_genetic.map --vcf ~/1000G_cal/pre_1000G/03_POP_vcf_fliter/AFR.chr11.phase3_shapeit2_mvncall_fliter.vcf.gz --out ~/1000G_cal/pre_1000G/99_EHH_cal/01_EHH_res/RV555_Locus --ehh-win 200000

It worked, I have successfully calculated some positions, but the other positions with faults.

31492 Segmentation fault (core dumped) selscan --ehh 11_67205462_C_A --map ~/1000G_cal/pre_1000G/map/AFR.chr11.phase3_shapeit2_mvncall_genetic.map --vcf ~/1000G_cal/pre_1000G/03_POP_vcf_fliter/AFR.chr11.phase3_shapeit2_mvncall_fliter.vcf.gz --out ~/1000G_cal/pre_1000G/99_EHH_cal/01_EHH_res/RV555_Locus --ehh-win 200000

I am also extracting a part of the chr11 chromosome data (3Mb) which extends from the query position 11_67205462. Although it has result, there are some stranger phenomena in the .out file

selscan --ehh 11_67205462_C_A --map ~/1000G_cal/pre_1000G/aa.tsv --vcf ~/1000G_cal/pre_1000G/aa.gz --out ~/1000G_cal/pre_1000G/99_EHH_cal/aa

872966349 -0.000000 0.000000 0.000000 0.000000 1235576192 0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1235576192 0.000000 0.000000 0.000000 0.000000 872966349 -0.000000 0.000000 0.000000 0.000000 1235576192 0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1235576192 0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1235576192 0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1235576192 0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1305944936 -0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1376313680 0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1587419913 0.000000 0.000000 0.000000 0.000000 872966349 0.000000 0.000000 0.000000 0.000000 1587419913 -0.000000 0.000000 0.000000 0.000000 ... 872972452 0.000000 0.264473 0.078809 0.070251 0 0.000000 0.264473 0.085674 0.076281 872972452 0.000000 0.264473 0.087539 0.077919 0 -0.000000 0.264473 0.097670 0.086817 872972452 -0.000000 0.264473 0.097670 0.086817 0 -0.000000 0.264473 0.101521 0.090200 872972452 0.000000 0.341463 0.150293 0.133338 0 0.000000 0.341463 0.150570 0.133581 872972452 0.000000 0.341463 0.232494 0.205537 0 0.000000 0.486629 0.278325 0.246358 872972452 -0.000000 0.975904 0.280038 0.249770 0 0.000000 0.975904 0.293624 0.261702 872972452 0.000000 0.975904 0.338196 0.300851 0 0.000000 1.000000 0.385775 0.342735 872972452 0.000000 1.000000 0.385775 0.342735 0 0.000000 1.000000 0.385775 0.342735 872972452 -0.000000 1.000000 0.385775 0.342735 0 0.000000 1.000000 0.386570 0.343433 872972452 0.000000 1.000000 0.761653 0.672880 0 0.000000 1.000000 1.000000 0.003897

I don't know if it is a map file error(mentioned by some issues in this repository), so I extent the query region (up to 10Mb), and the Segmentation fault is coming again.

Moreover, I use the param --pmap, which isn't working.

selscan --ehh rs869736 --map ~/1000G_cal/pre_1000G/aa.tsv --vcf ~/1000G_cal/pre_1000G/aa.gz --out ~/1000G_cal/pre_1000G/99_EHH_cal/aa --pmap

How to add genetic map information in the map file, I am using the discontinuous value of the HapMap Phase II genetic map(from build 35 to GRCh37) that is converted to the regional value of 1000G physical distance.

*_genetic.map 11 11_218243_T_C 0.021381 218243 11 11_218278_A_G 0.021381 218278 11 11_218391_A_G 0.021381 218391 11 11_218416_G_A 0.021381 218416 11 11_218431_T_C 0.021381 218431 11 11_218470_T_A 0.021381 218470 11 11_218489_G_A 0.021381 218489 11 11_218590_C_T 0.021381 218590 11 11_218613_G_A 0.021381 218613 11 11_218628_C_T 0.021403 218628 11 11_218640_G_A 0.021403 218640 11 11_218651_T_C 0.021774 218651 11 11_218732_CTGTTACTGTG_C 0.021774 218732 11 11_218793_A_T 0.021774 218793 11 11_218804_T_C 0.021774 218804 11 11_218811_G_A 0.021774 218811 11 11_218848_T_C 0.021774 218848 11 11_218904_G_A 0.021774 218904 11 11_218906_C_T 0.021774 218906 11 11_218907_G_A 0.021774 218907 11 11_219070_C_G 0.021774 219070 11 11_219078_A_G 0.021774 219078 11 11_219089_T_C 0.021774 219089 11 11_219202_T_C 0.022065 219202 11 11_219217_T_G 0.022065 219217 11 11_219219_C_T 0.022065 219219 11 11_219240_G_A 0.022065 219240 11 11_219241_C_CCT 0.022065 219241 11 11_219366_G_C 0.022065 219366 11 11_219379_C_A 0.022065 219379 11 11_219398_G_A 0.022065 219398 11 11_219423_T_C 0.022184 219423 11 11_219442_C_T 0.022184 219442 11 11_219452_C_G 0.022184 219452 11 11_219537_C_T 0.022184 219537 11 11_219538_A_G 0.022184 219538

selscan v1.3.0

Moxsf commented 2 years ago

The selscan V1.1.0b isn't working too.

Opening /home/fanxutong/1000G_cal/pre_1000G/aa.gz... Loading 1322 haplotypes and 47656 loci... Opening /home/fanxutong/1000G_cal/pre_1000G/aa.tsv... Loading map data for 47656 loci ERROR: Variant physical position must be strictly increasing. 11_66051219_AC_A 66051219 comes after 11_66051219_AC_ACC 66051219

Is it the same position in the VCF file that causes this problem? The success result also has the same position. In order to get the 0|1 code in the VCF file, we need to divide the multi-allele sites into bi-allele sites. That is why to have some same positions in the VCF file.

szpiech commented 2 years ago

Hi there, so the EHH module (--ehh flag) definitely needs some work, my apologies for the bugs there, and I'll have to investigate these issues. Re the second post, right now selscan can only handle biallelic loci and more than one variant can not occur at the same physical location. Unfortunately, for the time being, you will have to do some filtering of those sites in order to run selscan.

Moxsf commented 2 years ago

The selscan is a fantastic tool to calculate the positive selection signal which bases on the haplotype theory. Thanks for your work.

sadiexiaoyu commented 2 years ago

Hi there, so the EHH module (--ehh flag) definitely needs some work, my apologies for the bugs there, and I'll have to investigate these issues. Re the second post, right now selscan can only handle biallelic loci and more than one variant can not occur at the same physical location. Unfortunately, for the time being, you will have to do some filtering of those sites in order to run selscan.

Hi, Szpiech,

I also met the problem when I run EHH using selscan. I got the error report like this: Loading 612 haplotypes and 4517734 loci... Opening new_file.map... Loading map data for 4517734 loci Found 7:101844851 in data. --skip-low-freq set. Removing all variants < 0.05. Removed 4180736 low frequency variants. yhrun: error: cn7664: task 0: Segmentation fault

I would like to ask whether this issue has already been solved.

Looking forward to your reply!

szpiech commented 2 years ago

Hi,

Do you have a small example file with commands that reproduce this error? This would help me track down the issue.

-Zachary

Le jeu. 3 mars 2022 à 6:43 AM, sadiexiaoyu @.***> a écrit :

Hi there, so the EHH module (--ehh flag) definitely needs some work, my apologies for the bugs there, and I'll have to investigate these issues. Re the second post, right now selscan can only handle biallelic loci and more than one variant can not occur at the same physical location. Unfortunately, for the time being, you will have to do some filtering of those sites in order to run selscan.

Hi, Szpiech,

I also met the problem when I run EHH using selscan. I got the error report like this: Loading 612 haplotypes and 4517734 loci... Opening new_file.map... Loading map data for 4517734 loci Found 7:101844851 in data. --skip-low-freq set. Removing all variants < 0.05. Removed 4180736 low frequency variants. yhrun: error: cn7664: task 0: Segmentation fault

I would like to ask whether this issue has already been solved.

Looking forward to your reply!

— Reply to this email directly, view it on GitHub https://github.com/szpiech/selscan/issues/65#issuecomment-1057960352, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQTMMA2GB44S6CES5ZTU6CQW7ANCNFSM5DYJJLCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

sadiexiaoyu commented 2 years ago

Hi, Do you have a small example file with commands that reproduce this error? This would help me track down the issue. -Zachary Le jeu. 3 mars 2022 à 6:43 AM, sadiexiaoyu @.> a écrit : Hi there, so the EHH module (--ehh flag) definitely needs some work, my apologies for the bugs there, and I'll have to investigate these issues. Re the second post, right now selscan can only handle biallelic loci and more than one variant can not occur at the same physical location. Unfortunately, for the time being, you will have to do some filtering of those sites in order to run selscan. Hi, Szpiech, I also met the problem when I run EHH using selscan. I got the error report like this: Loading 612 haplotypes and 4517734 loci... Opening new_file.map... Loading map data for 4517734 loci Found 7:101844851 in data. --skip-low-freq set. Removing all variants < 0.05. Removed 4180736 low frequency variants. yhrun: error: cn7664: task 0: Segmentation fault I would like to ask whether this issue has already been solved. Looking forward to your reply! — Reply to this email directly, view it on GitHub <#65 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAKRQTMMA2GB44S6CES5ZTU6CQW7ANCNFSM5DYJJLCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you commented.Message ID: @.>

Hi, Szpiech, I have uploaded the dataset to the Dryad, and the link is https://datadryad.org/stash/share/pCV-fUS0T8GB27ri4m6brCCbI3IauCbUPYb76slbcbo the code that I used is: selscan --ehh 4:1803251 --hap chr4hap_Africanout.txt --map new_file.map --out chr4_African_ehh --threads 20

szpiech commented 2 years ago

Thank you, I hope to get to this soon.

szpiech commented 2 years ago

Hello,

I believe I have a working fix for this bug in the devel branch https://github.com/szpiech/selscan/tree/devel. I compiled a linux version of the binary which is available here https://github.com/szpiech/selscan/tree/devel/bin/linux. If you need a different OS, you will have to compile yourself for the moment. Note that norm currently fails to compile on the devel branch.

-Zachary