Open jbalberge opened 3 years ago
This is unusual behavior, but I do have a suggestion. There was an issue in the dedupe step that another user pointed out in issue #92, which I fixed in version 1.1.3.
Is there a contact person at the Broad Insitute that is in charge of maintaining the svaba docker image? You could reach out to them to update svaba to the current version here on Github, 1.1.3.
Thank you for the quick reply. I used the biocontainer's docker for 1.1.0 available at https://biocontainers.pro/tools/svaba Unfortunately, upgrading a docker to v1.1.3 didn't solve the problem. Could it be that the number of events is too high?
jbalberge we are having the same issue stuck for 100+ hours at this step for a couple of samples. Did you mange to fix the issue?
...vcf - reading in the breakpoints file
...vcf sizeof empty VCFEntryPair 64 bytes
...read in 1,104,990 indels and 1,739,919 SVs
...vcf - deduplicating 1,739,919 events
...dedupe at 0 of 1,739,919
The SvABA version we have been using is from some time ago. We have successfully processed hundreds of samples with this version but now a couple of samples are just stuck. We could update the version and just run for the problem samples but not sure if that would fix the issue, but then again the cohort would no longer be "harmonised".
Program: SvABA
FH Version: 134
Contact: Jeremiah Wala [ jwala@broadinstitute.org ]
The cohorts we've analysed have germlines sequenced at ~30x and tumors from 60x to 120x.
The two problem samples have germline at 30x and the Tumor is at ~70x and ~120x. Both have been stuck at the dedupe step for 100+ hours. We have given it 200Gb for the run.
@ahwanpandey This is one the memory/run weakness of svaba that I've known about but haven't had time to fix. The issue is that svaba compiles all of the variants into an intermediate file, and this file needs to be sorted and de-duped at the end to make the organized VCF. For most runs this is fine, but if the number of suspected variants is high (in your case it is very high), then the memory can run very high as it tries to read in this entire file.
The solution is really to just do what samtools sort
does and do a scatter-gather sort, but I haven't been able to implement yet.
Out of curiosity, how large is the *.bps.txt.gz file for this run? That's the file that it is reading into memory.
Hi @walaj thanks so much for your response. For the two samples that are stuck, the "*.bps.txt.gz " files are 147M and 131M.
We have a lot of High Grade Ovarian Cancer WGS data and they indeed have a lot of structural variants. Is there any chance you would be able to fix this issue for us? We would be very grateful. I can even share the files if that would be useful. We have already run svaba for hundreds of samples throughout the years and as you can understand it would be tricky to not be able to run the tool on a couple of samples, and probably more in the future. So again, we would be very grateful if you could have a look at fixing the issue when you get a chance.
The other option we are trying is to run the latest version of the tool. Do you think we will have the same problem with it?
I'm trying to install the latest version, but as you've noted I think I need to fix what CMAKE is doing. https://github.com/walaj/svaba/issues/132
If I remember correctly this happened with short inserts; hard-trimming of adapters and PolyG must have reduced the number of candidates in my case at that time.
Le mar. 30 janv. 2024 à 20:09, Jeremiah Wala @.***> a écrit :
@ahwanpandey https://github.com/ahwanpandey This is one the memory/run weakness of svaba that I've known about but haven't had time to fix. The issue is that svaba compiles all of the variants into an intermediate file, and this file needs to be sorted and de-duped at the end to make the organized VCF. For most runs this is fine, but if the number of suspected variants is high (in your case it is very high), then the memory can run very high as it tries to read in this entire file.
The solution is really to just do what samtools sort does and do a scatter-gather sort, but I haven't been able to implement yet.
Out of curiosity, how large is the *.bps.txt.gz file for this run? That's the file that it is reading into memory.
— Reply to this email directly, view it on GitHub https://github.com/walaj/svaba/issues/101#issuecomment-1918181250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMR3P4Z6PBINPGPK46T74TYRGKTHAVCNFSM46DRMNJ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJRHAYTQMJSGUYA . You are receiving this because you authored the thread.Message ID: @.***>
Hmm OK, I'm concerned given that file size of bps.txt.gz (not so big) that there is a memory clash happening somewhere that's running up the memory as a bug. There was a bug that caused some memory clashes randomly on < 5% of samples, at the dedupe stage, but I fixed it a while ago. I would think that our best approach here is to have you try with the newly built version, and you'll just have a few samples that were run with a newer version. Nothing too substantive has changed, just bug fixes and build systems, so you wouldn't have to re-run your other samples.
If you're still getting the same memory overrun issues on the latest version for these samples, I'll have to re-visit the smart sorting. But with bps files that small, I doubt that this is the issue now.
On Tue, Jan 30, 2024 at 8:22 PM ahwanpandey @.***> wrote:
Hi @walaj https://github.com/walaj thanks so much for your response. For the two samples that are stuck, the "*.bps.txt.gz " files are 147M and 131M.
We have a lot of High Grade Ovarian Cancer WGS data and they indeed have a lot of structural variants. Is there any chance you would be able to fix this issue for us? We would be very grateful. I can even share the files if that would be useful. We have already run svaba for hundreds of samples throughout the years and as you can understand it would be tricky to not be able to run the tool on a couple of samples, and probably more in the future. So again, we would be very grateful if you could have a look at fixing the issue when you get a chance.
The other option we are trying is to run the latest version of the tool. Do you think we will have the same problem with it?
I'm trying to install the latest version, but as you've noted I think I need to fix what CMAKE is doing.
132 https://github.com/walaj/svaba/issues/132
— Reply to this email directly, view it on GitHub https://github.com/walaj/svaba/issues/101#issuecomment-1918192789, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUZ7CCROI5MZLU6TLCSNVLYRGMHFAVCNFSM46DRMNJ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJRHAYTSMRXHA4Q . You are receiving this because you were mentioned.Message ID: @.***>
Hi @walaj I've now tried to re-run with the latest version and still got stuck at the dedupe step for two samples :/. Would it be possible for you to see if you could fix this issue for us? I can share any files you need. We would be very grateful for your time in fixing this bug.
Stuck at the following step for two out for hundreds of WGS samples.
==> AN_T_65913_1600143_21_N_65913_GL/std_out_err_AN/WGS.SvABA.STAGE0.SvABA.AN_T_65913_1600143_21_N_65913_GL.new.17918181.papr-res-compute215.err <==
-----------------------------------------------------------
--- Running svaba SV and indel detection on 8 threads ----
--- (inspect *.log for real-time progress updates) ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
...vcf - reading in the breakpoints file
...vcf sizeof empty VCFEntryPair 64 bytes
...read in 1,104,585 indels and 1,596,340 SVs
...vcf - deduplicating 1,596,340 events
...dedupe at 0 of 1,596,340
==> AN_T_66639_2100027_16_N_66639_GL/std_out_err_AN/WGS.SvABA.STAGE0.SvABA.AN_T_66639_2100027_16_N_66639_GL.new.17918182.papr-res-compute06.err <==
-----------------------------------------------------------
--- Running svaba SV and indel detection on 8 threads ----
--- (inspect *.log for real-time progress updates) ---
-----------------------------------------------------------
[M::bwa_idx_load_from_disk] read 0 ALT contigs
--- Loaded non-read data. Starting detection pipeline
...vcf - reading in the breakpoints file
...vcf sizeof empty VCFEntryPair 64 bytes
...read in 1,074,831 indels and 1,282,024 SVs
...vcf - deduplicating 1,282,024 events
...dedupe at 0 of 1,282,024
The output directory contents so far
Latest SVABA VERSION where issue persists
------------------------------------------------------------
-------- SvABA - SV and indel detection by assembly --------
------------------------------------------------------------
Program: SvABA
Version: 1.1.3
Contact: Jeremiah Wala [ jeremiah.wala@gmail.org ]
Usage: svaba <command> [options]
Commands:
run Run SvABA SV and Indel detection on BAM(s)
refilter Refilter the SvABA breakpoints with additional/different criteria to created filtered VCF and breakpoints file.
Report bugs to jwala@broadinstitute.org
Old version where issue was first observed
------------------------------------------------------------
--- SvABA (sah-bah) - SV and indel detection by assembly ---
------------------------------------------------------------
Program: SvABA
FH Version: 134
Contact: Jeremiah Wala [ jwala@broadinstitute.org ]
Usage: svaba <command> [options]
Commands:
run Run SvABA SV and Indel detection on BAM(s)
refilter Refilter the SvABA breakpoints with additional/different criteria to created filtered VCF and breakpoints file.
Report bugs to jwala@broadinstitute.org
@walaj is there any chance you could have a look at this issue for us? We would be very grateful for the help. Thanks so much.
This is fixed in the latest commit (d9f37dbc40ed783b5758389405113ac2a0dfbd82
)
@walaj Thanks for all the help so far.
I have now downloaded the latest commit and processed some old samples using the old version (as mentioned in this issue) as well as the latest commit ( fcfa17e ). The results are drastically different in the number of passing somatic SVs. See plot below summarized for each chromosome across two samples (latest commit results in orange bars)
I noticed that in the new commit's log file there are lots of messages saying "with limit hit of 0" whereas not so much in the old version. Not sure if this has to do with anything. I also ran the new version with 16 threads instead of 8 in the old version. I'll try to run with 8 threads and see if that fixes anything? Do you have any ideas? Thanks again.
OLD VERSION
]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40
writing contigs etc on thread 140475294115584 with limit hit of 796
writing contigs etc on thread 140475302508288 with limit hit of 474
writing contigs etc on thread 140475260544768 with limit hit of 1353
writing contigs etc on thread 140475285722880 with limit hit of 2536
writing contigs etc on thread 140475277330176 with limit hit of 2743
writing contigs etc on thread 140469615314688 with limit hit of 3811
writing contigs etc on thread 140475294115584 with limit hit of 336
writing contigs etc on thread 140475268937472 with limit hit of 1780
writing contigs etc on thread 140475302508288 with limit hit of 307
writing contigs etc on thread 140475310900992 with limit hit of 1795
writing contigs etc on thread 140475285722880 with limit hit of 552
writing contigs etc on thread 140475277330176 with limit hit of 916
writing contigs etc on thread 140475302508288 with limit hit of 574
writing contigs etc on thread 140475310900992 with limit hit of 437
writing contigs etc on thread 140475260544768 with limit hit of 1059
writing contigs etc on thread 140475285722880 with limit hit of 1293
writing contigs etc on thread 140475268937472 with limit hit of 2951
writing contigs etc on thread 140475294115584 with limit hit of 4241
writing contigs etc on thread 140469615314688 with limit hit of 5049
writing contigs etc on thread 140475302508288 with limit hit of 8076
writing contigs etc on thread 140475310900992 with limit hit of 4492
writing contigs etc on thread 140475277330176 with limit hit of 5499
writing contigs etc on thread 140475294115584 with limit hit of 6412
writing contigs etc on thread 140475268937472 with limit hit of 5956
writing contigs etc on thread 140475285722880 with limit hit of 16232
writing contigs etc on thread 140475260544768 with limit hit of 15423
writing contigs etc on thread 140469615314688 with limit hit of 7244
writing contigs etc on thread 140475302508288 with limit hit of 6837
writing contigs etc on thread 140475310900992 with limit hit of 8440
writing contigs etc on thread 140475268937472 with limit hit of 8838
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475260544768 with limit hit of 7990
writing contigs etc on thread 140475294115584 with limit hit of 13428
writing contigs etc on thread 140475285722880 with limit hit of 8048
writing contigs etc on thread 140475277330176 with limit hit of 11336
writing contigs etc on thread 140469615314688 with limit hit of 7874
writing contigs etc on thread 140475310900992 with limit hit of 8119
writing contigs etc on thread 140475302508288 with limit hit of 8213
LATEST COMMIT
]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
writing contigs etc on thread 139874581395200 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139874465990400 with limit hit of 0
writing contigs etc on thread 139874573002496 with limit hit of 0
writing contigs etc on thread 139868867270400 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868875663104 with limit hit of 0
writing contigs etc on thread 139874482775808 with limit hit of 0
writing contigs etc on thread 139868842092288 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139874474383104 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139874581395200 with limit hit of 0
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139874573002496 with limit hit of 0
writing contigs etc on thread 139874465990400 with limit hit of 0
writing contigs etc on thread 139868884055808 with limit hit of 0
writing contigs etc on thread 139868858877696 with limit hit of 0
writing contigs etc on thread 139868867270400 with limit hit of 0
writing contigs etc on thread 139868825306880 with limit hit of 0
writing contigs etc on thread 139868842092288 with limit hit of 0
writing contigs etc on thread 139874482775808 with limit hit of 0
writing contigs etc on thread 139874474383104 with limit hit of 0
writing contigs etc on thread 139868875663104 with limit hit of 0
writing contigs etc on thread 139868833699584 with limit hit of 0
writing contigs etc on thread 139874457597696 with limit hit of 0
writing contigs etc on thread 139874491168512 with limit hit of 0
writing contigs etc on thread 139868850484992 with limit hit of 0
writing contigs etc on thread 139874564609792 with limit hit of 0
Thank you for reporting, this is now fixed. Rolling forward BWA 8 years as part of these latest round of updates introduced some nasty bugs on my part, and this one ended up being simple to fix once I found it. The latest svaba (and latest SeqLib it points to) should address this.
On Wed, May 8, 2024 at 1:02 AM ahwanpandey @.***> wrote:
@walaj https://github.com/walaj Thanks for all the help so far.
I have now downloaded the latest commit and processed some old samples using the old version (as mentioned in this issue https://github.com/walaj/svaba/issues/101#issuecomment-1915491648) as well as the latest commit ( fcfa17e https://github.com/walaj/svaba/commit/fcfa17eff5aedaa8ea2a595bfd9ab04ac665aa48 ). The results are drastically different in the number of passing somatic SVs. See plot below summarized for each chromosome across two samples (latest commit results in orange bars) image.png (view on web) https://github.com/walaj/svaba/assets/8450532/23021ccb-b958-4286-8f64-3a0fad950bb2
I noticed that in the new commit's log file there are lots messages saying "with limit hit of 0" whereas not so much in the old version. Not sure if this has to do with anything. I also ran the new version with 16 threads instead of 8 in the old version. I'll try to run with 8 threads and see if that fixes anything? Do you have any ideas? Thanks again.
OLD VERSION
]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40 writing contigs etc on thread 140475294115584 with limit hit of 796 writing contigs etc on thread 140475302508288 with limit hit of 474 writing contigs etc on thread 140475260544768 with limit hit of 1353 writing contigs etc on thread 140475285722880 with limit hit of 2536 writing contigs etc on thread 140475277330176 with limit hit of 2743 writing contigs etc on thread 140469615314688 with limit hit of 3811 writing contigs etc on thread 140475294115584 with limit hit of 336 writing contigs etc on thread 140475268937472 with limit hit of 1780 writing contigs etc on thread 140475302508288 with limit hit of 307 writing contigs etc on thread 140475310900992 with limit hit of 1795 writing contigs etc on thread 140475285722880 with limit hit of 552 writing contigs etc on thread 140475277330176 with limit hit of 916 writing contigs etc on thread 140475302508288 with limit hit of 574 writing contigs etc on thread 140475310900992 with limit hit of 437 writing contigs etc on thread 140475260544768 with limit hit of 1059 writing contigs etc on thread 140475285722880 with limit hit of 1293 writing contigs etc on thread 140475268937472 with limit hit of 2951 writing contigs etc on thread 140475294115584 with limit hit of 4241 writing contigs etc on thread 140469615314688 with limit hit of 5049 writing contigs etc on thread 140475302508288 with limit hit of 8076 writing contigs etc on thread 140475310900992 with limit hit of 4492 writing contigs etc on thread 140475277330176 with limit hit of 5499 writing contigs etc on thread 140475294115584 with limit hit of 6412 writing contigs etc on thread 140475268937472 with limit hit of 5956 writing contigs etc on thread 140475285722880 with limit hit of 16232 writing contigs etc on thread 140475260544768 with limit hit of 15423 writing contigs etc on thread 140469615314688 with limit hit of 7244 writing contigs etc on thread 140475302508288 with limit hit of 6837 writing contigs etc on thread 140475310900992 with limit hit of 8440 writing contigs etc on thread 140475268937472 with limit hit of 8838 writing contigs etc on thread 140475260544768 with limit hit of 7990 writing contigs etc on thread 140475260544768 with limit hit of 7990 writing contigs etc on thread 140475260544768 with limit hit of 7990 writing contigs etc on thread 140475260544768 with limit hit of 7990 writing contigs etc on thread 140475294115584 with limit hit of 13428 writing contigs etc on thread 140475285722880 with limit hit of 8048 writing contigs etc on thread 140475277330176 with limit hit of 11336 writing contigs etc on thread 140469615314688 with limit hit of 7874 writing contigs etc on thread 140475310900992 with limit hit of 8119 writing contigs etc on thread 140475302508288 with limit hit of 8213
LATEST COMMIT
]$ cat AN_T_66639_2100027_14_N_66639_GL.log | grep "with limit hit of" | head -n 40 writing contigs etc on thread 139868850484992 with limit hit of 0 writing contigs etc on thread 139868858877696 with limit hit of 0 writing contigs etc on thread 139868884055808 with limit hit of 0 writing contigs etc on thread 139868833699584 with limit hit of 0 writing contigs etc on thread 139868825306880 with limit hit of 0 writing contigs etc on thread 139874457597696 with limit hit of 0 writing contigs etc on thread 139874564609792 with limit hit of 0 writing contigs etc on thread 139874581395200 with limit hit of 0 writing contigs etc on thread 139874491168512 with limit hit of 0 writing contigs etc on thread 139874465990400 with limit hit of 0 writing contigs etc on thread 139874573002496 with limit hit of 0 writing contigs etc on thread 139868867270400 with limit hit of 0 writing contigs etc on thread 139868858877696 with limit hit of 0 writing contigs etc on thread 139868875663104 with limit hit of 0 writing contigs etc on thread 139874482775808 with limit hit of 0 writing contigs etc on thread 139868842092288 with limit hit of 0 writing contigs etc on thread 139868833699584 with limit hit of 0 writing contigs etc on thread 139868884055808 with limit hit of 0 writing contigs etc on thread 139874474383104 with limit hit of 0 writing contigs etc on thread 139874564609792 with limit hit of 0 writing contigs etc on thread 139868825306880 with limit hit of 0 writing contigs etc on thread 139874457597696 with limit hit of 0 writing contigs etc on thread 139874491168512 with limit hit of 0 writing contigs etc on thread 139874581395200 with limit hit of 0 writing contigs etc on thread 139868850484992 with limit hit of 0 writing contigs etc on thread 139874573002496 with limit hit of 0 writing contigs etc on thread 139874465990400 with limit hit of 0 writing contigs etc on thread 139868884055808 with limit hit of 0 writing contigs etc on thread 139868858877696 with limit hit of 0 writing contigs etc on thread 139868867270400 with limit hit of 0 writing contigs etc on thread 139868825306880 with limit hit of 0 writing contigs etc on thread 139868842092288 with limit hit of 0 writing contigs etc on thread 139874482775808 with limit hit of 0 writing contigs etc on thread 139874474383104 with limit hit of 0 writing contigs etc on thread 139868875663104 with limit hit of 0 writing contigs etc on thread 139868833699584 with limit hit of 0 writing contigs etc on thread 139874457597696 with limit hit of 0 writing contigs etc on thread 139874491168512 with limit hit of 0 writing contigs etc on thread 139868850484992 with limit hit of 0 writing contigs etc on thread 139874564609792 with limit hit of 0
— Reply to this email directly, view it on GitHub https://github.com/walaj/svaba/issues/101#issuecomment-2099745813, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUZ7CAL6RQCHCVDCU5VJ6DZBGWXXAVCNFSM46DRMNJ2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBZHE3TINJYGEZQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi @walaj
Thanks for fixing this. I think everything looks good now in (63ffa29)!
Thanks again for all the help.
Best, Ahwan
Running svaba on Terra/Firecloud, we have troubles at this step of
svaba run
for >30X Tumor/Normal WGS (from the logs)Actual run time is a couple of hours for variant calling, then the logs get stuck at dedupe steps for 100+ hours and counting. Have you seen that before? Is there anything we can do to debug this situation?
We tried with up to 128GB, 16CPU and 1000GB HDD VMs with svaba 1.1.0 quay docker
Thanks for your help!