statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
116 stars 25 forks source link

not generating .best and .sing2 output files #69

Open slyahn opened 4 years ago

slyahn commented 4 years ago

I am processing 8 multiplexed samples through demuxlet and everything seems to run fine until the very end. Demuxlet generates the .single file but not the .best and .sing2 files. The standard output shows that it finishes processing the droplets ("Finished processing 21976 droplets total") but then reports a segmentation fault (core dumped) error.

I started with 60GB memory and went up to 180GB and that did not fix it. The vcf is filtered to include only biallelic SNPS, it is sorted, and the contigs match in the bam and the vcf. I don't know what could be causing it to fail at the very end when writing the .best file. Do you have any suggestions?

Edited to add: I've tried downsampling the bam to 10% of the original, and I still get the same segmentation fault and only the .single file is generated, so I don't think it's a memory issue.

I should note that this experiment is essentially a simulation using real data. We combined fastq files from 8 individual runs to simulate a multiplexed run. The combined fastq was processed with Cellranger without error. The genotype vcf was generated by a private company who did low pass whole genome sequencing and imputation.

ddsouz5 commented 3 years ago

Hi @slyahn, did you figure out what the problem was? Having the same issue too!

VincentGardeux commented 3 years ago

Would Fix #59 fix the memory issue? We tested on ~50 genotypes / 5M snps and it runs without out of RAM.

boxiangliu commented 2 years ago

Dear @hyunminkang, I am having the same issue as stated above. The software runs to the step where .single has been generated, but reports a segmentation fault. The .sing2 and *.best files are empty.

I am not sure how to debug this error. Memory does not seem to be the issue (my machine has 386G RAM). Do you have any idea why this would happen? Could you point us to the right path?

VincentGardeux commented 2 years ago

Hi @boxiangliu,

The way it's coded in demuxlet is definitely not the best, i.e. they generate huge HUGE arrays (which is both not optimal, and not needed). For e.g. there is a line which creates an array gpAB:

double* gpAB = new double[scl.nsnps * nv * nv * 9];

So in my example of 5M snps (nsnps), 50 genotypes (nv), and since double size is 8 bytes, this would generate an array of size 5000000 x 50 x 50 x 9 x 8 = 900Gb. Do you have 900Gb of RAM? :D

That's why I suggested the Fix #59 two years ago, which does not create the array, and just compute the data on the go without storing it to RAM. But it was never merged to the main branch.

I guess you can maybe try it (Fix #59), to see if it solves your issue.

Hope this helps.

Cheers

hyunminkang commented 1 year ago

I don't think it is a good idea to put 5M SNPs with the current implementation. I suggest to use only common variants in coding region.

Thanks, Hyun.

Hyun Min Kang, Ph.D. Professor of Biostatistics University of Michigan, Ann Arbor Email : @.***

On Wed, May 18, 2022 at 6:30 AM Vincent Gardeux @.***> wrote:

Hi @boxiangliu https://github.com/boxiangliu,

The way it's coded in demuxlet is definitely not the best, i.e. they generate huge HUGE arrays (which is both not optimal, and not needed). For e.g. there is a line which creates an array gpAB:

double gpAB = new double[scl.nsnps nv nv 9];

So in my example of 5M snps (nsnps), 50 genotypes (nv), and since double size is 8 bytes, this would generate an array of size 5000000505098 = 900Gb. Do you have 900Gb of RAM? :D

That's why I suggested the Fix #59 https://github.com/statgen/demuxlet/pull/59 two years ago, which does not create the array, and just compute the data on the go without storing it to RAM. But it was never merged to the main branch.

I guess you can maybe try it (Fix #59 https://github.com/statgen/demuxlet/pull/59), to see if it solves your issue.

Hope this helps.

Cheers

— Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/69#issuecomment-1129842203, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPY5OLKEKQUTKUNZKXB62LVKTBCDANCNFSM4QEGSCLA . You are receiving this because you were mentioned.Message ID: @.***>