Open jdidion opened 7 years ago
sorry to see this problem. Since I recently did some major update to the code, it's very likely I missed something. Did you use the bam generated from biscuit? Meanwhile let me also double check using our data and get back to you.
No, I previously aligned my data using bwa-meth.
On Jan 17, 2017, at 10:57 AM, Wanding Zhou - Bioinformatics notifications@github.com wrote:
sorry to see this problem. Since I recently did some major update to the code, it's very likely I missed something. Did you use the bam generated from biscuit? Meanwhile let me also double check using our data and get back to you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zwdzwd/biscuit/issues/8#issuecomment-273210236, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnoKUWkagHgswoG4NmrmWCm7Blw3Bks5rTOTbgaJpZM4LlyYG.
I see. In theory it should be compatible. Did you see the seg fault immediately? or after a while? Did you noticed any memory problem?
Thanks for the feedback,
I just tried joint calling on all 64 samples on chr22. It segfaults at around 16 MB, so not right away. I gave it 32 cores and 64 GB memory and it only used a max of 14 GB.
On Jan 17, 2017, at 11:03 AM, Wanding Zhou - Bioinformatics notifications@github.com wrote:
I see. In theory it should be compatible. Did you see the seg fault immediately? or after a while? Did you noticed any memory problem?
Thanks for the feedback,
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zwdzwd/biscuit/issues/8#issuecomment-273212155, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnkX5_gmwTkUDOFru1pc3nL8xbA_wks5rTOZHgaJpZM4LlyYG.
Do you mean pooling all 64 samples in one command? Could you show me the command you used? Just to make sure.
/home/didionjp/biscuit/biscuit pileup -q 32 -r $REF_GENOME -g chr22 -o chr22.bs_snps.vcf -i <space-separated list of 64 BAMs here>
On Jan 17, 2017, at 11:06 AM, Wanding Zhou - Bioinformatics notifications@github.com wrote:
Do you mean pooling all 64 samples in one command? Could you show me the command you used? Just to make sure.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zwdzwd/biscuit/issues/8#issuecomment-273213251, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnuxAkaB-cfJYjfh_BYqcZr6isruoks5rTOcXgaJpZM4LlyYG.
I would need to duplicate your use case (BWAmeth and many samples) to replicate the error. I just tried on 5 samples and didn't notice anything. Let me produce some BWA-meth bams and get back to you.
Ok, thanks. For more details, these are quite deep as far as WGBS goes (mean 40x coverage). I aligned to GRCh37 (full UCSC reference).
On Jan 17, 2017, at 11:24 AM, Wanding Zhou - Bioinformatics notifications@github.com wrote:
I would need to duplicate your use case (BWAmeth and many samples) to replicate the error. I just tried on 5 samples and didn't notice anything. Let me produce some BWA-meth bams and get back to you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zwdzwd/biscuit/issues/8#issuecomment-273218621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnhjbkPznEewlPGxQkSl1_aULCPs9ks5rTOtFgaJpZM4LlyYG.
Now that I look at it again, this might actually be a memory issue. When I run joint SNP calling on chr22, I give it 32 cores and 128 GB memory. It dies with max memory usage of 119 GB, but the segfault might be happening when it requests the next chunk of memory. Is there a way I can estimate the max memory usage? Is it linear with number of threads? I’m running now with 16 cores and 128 GB, so we’ll see what happens.
On Jan 17, 2017, at 11:27 AM, John Didion johnpaul@didion.net wrote:
Ok, thanks. For more details, these are quite deep as far as WGBS goes (mean 40x coverage). I aligned to GRCh37 (full UCSC reference).
On Jan 17, 2017, at 11:24 AM, Wanding Zhou - Bioinformatics <notifications@github.com mailto:notifications@github.com> wrote:
I would need to duplicate your use case (BWAmeth and many samples) to replicate the error. I just tried on 5 samples and didn't notice anything. Let me produce some BWA-meth bams and get back to you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zwdzwd/biscuit/issues/8#issuecomment-273218621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnhjbkPznEewlPGxQkSl1_aULCPs9ks5rTOtFgaJpZM4LlyYG.
Okay, this looks like a memory issue after all. With 16 cores and 128 GB it finishes, but the max memory usage was 120 GB so it was a near thing.
This raises a couple questions: Why are the memory requirements so high? Might there be a leak somewhere? At the very least it would be nice to have formula for rudimentary estimate of memory requirements. Does it scale with the number of sites called? If so, it’s probably only practical to call SNPs on at most one chromosome at a time.
I don’t mean to complain - this is looking like a really great tool; much better than the other available options. Keep up the good work!
On Jan 17, 2017, at 6:13 PM, John Didion johnpaul@didion.net wrote:
Now that I look at it again, this might actually be a memory issue. When I run joint SNP calling on chr22, I give it 32 cores and 128 GB memory. It dies with max memory usage of 119 GB, but the segfault might be happening when it requests the next chunk of memory. Is there a way I can estimate the max memory usage? Is it linear with number of threads? I’m running now with 16 cores and 128 GB, so we’ll see what happens.
On Jan 17, 2017, at 11:27 AM, John Didion <johnpaul@didion.net mailto:johnpaul@didion.net> wrote:
Ok, thanks. For more details, these are quite deep as far as WGBS goes (mean 40x coverage). I aligned to GRCh37 (full UCSC reference).
On Jan 17, 2017, at 11:24 AM, Wanding Zhou - Bioinformatics <notifications@github.com mailto:notifications@github.com> wrote:
I would need to duplicate your use case (BWAmeth and many samples) to replicate the error. I just tried on 5 samples and didn't notice anything. Let me produce some BWA-meth bams and get back to you.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zwdzwd/biscuit/issues/8#issuecomment-273218621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHrnhjbkPznEewlPGxQkSl1_aULCPs9ks5rTOtFgaJpZM4LlyYG.
I tried calling SNPs in 64 WGBS samples. All failed with segmentation faults at various points. Can you walk me through how to provide you with the debugging information you need to fix this bug?