richarddurbin / pbwt

Implementation of Positional Burrows-Wheeler Transform for genetic data
97 stars 35 forks source link

Reference impute using maximal matches: Killed #49

Open jerrywzy opened 3 years ago

jerrywzy commented 3 years ago

Hi there,

I am currently trying to use PBWT to impute a reference panel A onto another reference panel B, and vice versa. I am able to impute reference panel A with reference panel B. However, when imputing reference panel B with reference panel A, the process gets Killed on my Linux server.

Here are some lines from the output with file paths taken out:

read genotypes from panel_B/ch1_A.vcf.gz with 2504 sample names and 3738240 sites on chromosome 1: M, N are 5008, 3738240
user    500.356235      system  4.765889        max_RSS 448492  Memory  583036203
impute against reference panel_A_pbwt/chr1
read pbwt PBW3 file with 133049391 bytes: M, N are 9620, 7687647
read 7687647 sites on chromosome 1 from file
read 4810 sample names
1874940 sites selected from 7687647, pbwt size for 9620 haplotypes is 61442035
built reverse PBWT - size 61408995
1874940 sites selected from 3738240, pbwt size for 5008 haplotypes is 47999564
Imputation preliminaries: user  213.029846      system  1.175402        max_RSS 692372  Memory  2156881417
Reference impute using maximal matches: Killed 

What could be causing this? I've tried it with two different servers with the same results.

richarddurbin commented 3 years ago

Hello Jerry

Can you run with “-check”, which produces more verbose output, and tell me where it gets to?

Thanks, Richard

On 1 Jul 2021, at 05:32, jerrywzy @.***> wrote:

Hi there,

I am currently trying to use PBWT to impute a reference panel A onto another reference panel B, and vice versa. I am able to impute reference panel A with reference panel B. However, when imputing reference panel B with reference panel A, the process gets Killed on my Linux server.

Here are some lines from the output with file paths taken out:

read genotypes from panel_B/ch1_A.vcf.gz with 2504 sample names and 3738240 sites on chromosome 1: M, N are 5008, 3738240 user 500.356235 system 4.765889 max_RSS 448492 Memory 583036203 impute against reference panel_A_pbwt/chr1 read pbwt PBW3 file with 133049391 bytes: M, N are 9620, 7687647 read 7687647 sites on chromosome 1 from file read 4810 sample names 1874940 sites selected from 7687647, pbwt size for 9620 haplotypes is 61442035 built reverse PBWT - size 61408995 1874940 sites selected from 3738240, pbwt size for 5008 haplotypes is 47999564 Imputation preliminaries: user 213.029846 system 1.175402 max_RSS 692372 Memory 2156881417 Reference impute using maximal matches: Killed What could be causing this? I've tried it with two different servers with the same results.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/richarddurbin/pbwt/issues/49, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2FXZX6DHCY5TMSB5JB7ATTVPVU5ANCNFSM47TVEZWQ.

jerrywzy commented 3 years ago

Hi Richard,

Thanks for the reply. I've rerun the command with "-check", and got to the same point where the process gets killed again, unfortunately.

Here are the last few lines of the output:

written 95762838 chars pbwt: M, N are 5008, 3700000
written 3700000 sites from 10177 to 247152709
written 2504 samples
read genotypes from panelB_chr1_IDfixed.vcf with 2504 sample names and 3738240 sites on chromosome 1: M, N are 5008, 3738240
user    441.591560      system  16.612428       max_RSS 449228  Memory  583036114
impute against reference panelA/pbwt/chr1
read pbwt PBW3 file with 133049391 bytes: M, N are 9620, 7687647
read 7687647 sites on chromosome 1 from file
read 4810 sample names
1874940 sites selected from 7687647, pbwt size for 9620 haplotypes is 61442035
built reverse PBWT - size 61408995
written haplotype file: 1874940 rows of 9620
1874940 sites selected from 3738240, pbwt size for 5008 haplotypes is 47999564
Imputation preliminaries: user  397.107193      system  15.921011       max_RSS 692460  Memory  2157093083
Reference impute using maximal matches: ./merge_reciprocal.sh: line 13: 23977 Killed                  $PBWT/pbwt -checkpoint 100000 -check -readVcfGT panelB_chr1_IDfixed.vcf -referenceImpute panelA/pbwt/chr1 -writeVcfGz panelB_chr1_IDfixed_panelA_test.dose.vcf.gz