statgen / demuxlet

Genetic multiplexing of barcoded single cell RNA-seq
Apache License 2.0
117 stars 25 forks source link

Segmentation fault - core dumped #43

Closed jatinarora-upmc closed 5 years ago

jatinarora-upmc commented 5 years ago

Hi Hyun and Jimmie,

We have sequenced the cells that were pooled from 4 individuals. Now I am trying to de-convolute them using Demuxlet with 120 Gb memory. However, I get the segmentation fault error at Chr 6.

NOTICE [2019/05/21 21:59:58] - Observed 298000 droplets with unique cell barcode NOTICE [2019/05/21 22:00:07] - Reading 287000000 reads at chr6:16441802 and skipping 177642284 NOTICE [2019/05/21 22:00:15] - Observed 299000 droplets with unique cell barcode NOTICE [2019/05/21 22:00:15] - Observed 299000 droplets with unique cell barcode /opt/p6444/n048/job_scripts/21323893: line 9: 180888 Segmentation fault (core dumped) tools/demuxlet/bin/demuxlet --sam reads.bam --vcf genotype_info.vcf --sm-list sample.list --min-MQ 30 --field GT --out demux_out

As you suggested in the posts of other people, I already tried reducing the number of individuals from 4 to 2 using --sm-list, as well as taking only exonic SNPs, but I still get this error at Chr 6.

Could you please help? Thanks a lot, Jatin

hyunminkang commented 5 years ago

Use --group-list to limited the barcodes to specific subsets, if memory is the limit.

Hyun.

Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu

On Tue, May 21, 2019 at 12:02 PM Jatin Arora notifications@github.com wrote:

Hi Hyun and Jimmie,

We have sequenced the cells that were pooled from 4 individuals. Now I am trying to de-convolute them using Demuxlet. However, I get the segmentation fault error at Chr 6.

NOTICE [2019/05/21 21:59:58] - Observed 298000 droplets with unique cell barcode NOTICE [2019/05/21 22:00:07] - Reading 287000000 reads at chr6:16441802 and skipping 177642284 NOTICE [2019/05/21 22:00:15] - Observed 299000 droplets with unique cell barcode NOTICE [2019/05/21 22:00:15] - Observed 299000 droplets with unique cell barcode /opt/p6444/n048/job_scripts/21323893: line 9: 180888 Segmentation fault (core dumped) tools/demuxlet/bin/demuxlet --sam reads.bam --vcf genotype_info.vcf --sm-list sample.list --min-MQ 30 --field GT --out demux_out

As you suggested in the posts of other people, I already tried reducing the number of individuals from 4 to 2 using --sm-list, as well as taking only exonic SNPs, but I still get this error at Chr 6.

Could you please help? Thanks a lot, Jatin

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/43?email_source=notifications&email_token=ABPY5ON7UMGT7A5Y32RKB3TPWQMINA5CNFSM4HOMQSDKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GVAEM5A, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPY5OIHRQT6QPQRJOEJWQDPWQMINANCNFSM4HOMQSDA .

jatinarora-upmc commented 5 years ago

Hi Hyun,

Thank you for your reply. I believe memory is not the limit. I have tried the upto 320 Gb but it still gives the same error. Do you still suggest to subset the droplets? Please let me know if you suggest any other parameters to play with.

Thanks, Jatin

hyunminkang commented 5 years ago

It might be specific read is causing the problem. If you can share the subset of input that could replicate the problem, we can happy to look at. Or you can run gdb on your end to locate the line that causes the segfault.

Hyun.

On Tue, May 21, 2019, 1:34 PM Jatin Arora notifications@github.com wrote:

Hi Hyun,

Thank you for your reply. I believe memory is not the limit. I have tried the upto 320 Gb but it still gives the same error. Do you still suggest to subset the droplets? Please let me know if you suggest any other parameters to play with.

Thanks, Jatin

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/43?email_source=notifications&email_token=ABPY5ON3DNKM7PI34HESW5LPWQXDBA5CNFSM4HOMQSDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV4UHHQ#issuecomment-494486430, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPY5OKJJNX6QBNML6FIULLPWQXDBANCNFSM4HOMQSDA .

jatinarora-upmc commented 5 years ago

I tried to be limited to valid droplets by passing their barcodes to --group-list but I still get the same core dumped error. When I did gdb, the error seems to come from line 231 in cmd_cram_demuxlet.cpp. Here is the output from gdb:

(gdb) where

0 0x0000000000430027 in main (argc=23, argv=0x0) at cmd_cram_demuxlet.cpp:231

(gdb) list + 227 gps = new double[nv3]; 228 for(int32_t i=0; i < nv 3; ++i) { 229 gps[i] = vr.get_posterior_at(i); 230 } 231 snpid = scl.add_snp( vr.cursor()->rid, vr.cursor()->pos, vr.cursor()->d.allele[0][0], vr.cursor()->d.allele[1][0], vr.get_af(1), gps); 232 snpids.push_back(snpid); 233 } 234 else { 235 //error("Cannot read new SNP"); 236 } 237 } 238 239 // get barcode 240 int32_t ibcd = 0; 241 if ( tagGroup.empty() ) { 242 ibcd = scl.add_cell("."); 243 } 244 else { 245 uint8_t bcd = (gtag) ? (uint8_t*) bam_aux_get(sr.cursor(), gtag) : NULL;

In order to reproduce the error, I can share VCF file with you, but bam is too big to share (75Gb).

hyunminkang commented 5 years ago

Does the VCF contain biallelic SNPs only? Otherwise, you will need to filter them.

Hyun.

Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu

On Wed, May 22, 2019 at 5:40 AM Jatin Arora notifications@github.com wrote:

I tried to be limited to valid droplets by passing their barcodes to --group-list but I still get the same core dumped error. When I did gdb, the error seems to come from line 231 in cmd_cram_demuxlet.cpp. Here is the output from gdb

227 gps = new double[nv3]; 228 for(int32_t i=0; i < nv 3; ++i) { 229 gps[i] = vr.get_posterior_at(i); 230 } 231 snpid = scl.add_snp( vr.cursor()->rid, vr.cursor()->pos, vr.cursor()->d.allele[0][0], vr.cursor()->d.allele[1][0], vr.get_af(1), gps); 232 snpids.push_back(snpid); 233 } 234 else { 235 //error("Cannot read new SNP");

In order to reproduce the error, I can share VCF file with you, but bam is too big to share (75Gb).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/43?email_source=notifications&email_token=ABPY5OJPVF42HGOK2P4SIJTPWUIJ5A5CNFSM4HOMQSDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV6P36Y#issuecomment-494730747, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPY5OIXOLZWIJ6RN5RSGETPWUIJ5ANCNFSM4HOMQSDA .

jatinarora-upmc commented 5 years ago

The VCF contains multi-allelic SNPs also (e.g. tri allelic), which (as per the output) are ignored by Demuxlet. Shall I try to filter them out and run demuxlet only with biallelic SNPs?

hyunminkang commented 5 years ago

Yes, I would suggest that (without being able reproduce exact errors). The segfault comes from reading VCF, and either there are monomorphic variants (whether ALT allele does not exists), or non-biallelic SNPs or Indels might be making unexpected problems. I have not seen similar problems in the examples we are using.

Hyun.

Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : hmkang@umich.edu

On Wed, May 22, 2019 at 12:03 PM Jatin Arora notifications@github.com wrote:

The VCF contains multi-allelic SNPs also (e.g. tri allelic), which (as per the output) are ignored by Demuxlet. Shall I try to filter them out and run demuxlet only with biallelic SNPs?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/statgen/demuxlet/issues/43?email_source=notifications&email_token=ABPY5OOZZE5ZP6FGPPG75XTPWVVENA5CNFSM4HOMQSDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV7RFII#issuecomment-494867105, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPY5OJ5JCHLOJBWWFEJDZ3PWVVENANCNFSM4HOMQSDA .

jatinarora-upmc commented 5 years ago

Hi Hyun,

Your suggestion to use only biallelic variants SOLVED the problem of core dumped. It also increased the number of singlets while decreasing the doublet rate. thanks a lot for your kind assistance.

Thanks, Jatin Arora