wwylab / MuSE

Somatic point mutation caller
GNU General Public License v2.0
18 stars 6 forks source link

MuSEv2 hanging on chr1 when using DRAGEN-produced bamfiles #16

Closed scdunatun closed 7 months ago

scdunatun commented 8 months ago

Description of Bug MuSE v2.0.3 hangs on a position in chr1 while trying to process DRAGEN-aligned bamfiles when running on DNAnexus (within a WDL applet). The position in chr1 is not the same between repeated attempts of the same bamfile.

To Reproduce Steps to reproduce the behavior:

  1. Run DNAnexus custom applet for MuSE with input bamfile which: a. Indexes the bamfile using samtools. b. Runs MuSE e.g. MuSE call -O calls -f hg19.fa -n 72 bam1.bam bam2.bam.
  2. Open DNAnexus logging / watch the DNAnexus job.
  3. Wait ~2 minutes after MuSE call has started and review the repeated log lines for a position in chr1.
  4. Repeat and identify that MuSE hangs at a different chr1 position.

Expected behavior MuSE is able to complete calling for the input bam.

Screenshots Position being processed remains the same while the writeQSize increases:

image

BamRead crawls up to 20000, writeQSize hits 10000, and after that no more changes are seen in the loglines, though the CPU usage remains at 100%:

image

MuSEv2_DRAGENbam.txt (Log lines that look like muse METRICS CPU usr/sys/idl/wai: 100/0/0/0% (72 cores) * Memory: 28974/140732MB * Storage: 300/1680GB * Net: 0↓/0↑MBps * Disk: r/w 0/0 MBps iops r/w 0/7 are instance usage information.)

Additional context We are specifically using bamfiles produced by Illumina's DRAGEN alignment on ICA after sequencing, but not run though additional steps. This same sample was previously sequenced on a different sequencer and processed through MuSEv1 after using a custom pipelines for demux and dedupe (did not run into this issue). Other samples using our custom pipeline were able to run through MuSEv2.

I haven't managed to get this issue to consistently present when running MuSEv2 directly from command line. The two times that I successfully got the phenotype through commandline, I identified that the calls file was not being written to after the four different items being logged stabilised at a value (it may have started a bit sooner, but I am uncertain as that was an empirical observation).

Information about what those loglines are actually tracking would be helpful, since that might help in targeting our investigation better.

Additional Debugging Attempts

jiyunmaths commented 8 months ago

Dear @scdunatun Thanks for reporting the bug of MuSE v2 and the detailed information. In order to for us to fix the issue, can I ask if you share the bam files (tumor+normal) from chr1 only? We are happy to debug it and update the software.

scdunatun commented 8 months ago

@jiyunmaths Would there be a way for me to securely contact you with a download link to the chr1 bamfiles? They are quite large (5-10GB) and are currently stored on DNAnexus. I can provide a download URL, but would prefer to ensure that only those who need it for debugging purposes have access to it.

jiyunmaths commented 8 months ago

Thank you @scdunatun . I just updated the repo, but have not tagged a new version. Please can you test the updated code and see if the issue is fixed on your data?

scdunatun commented 8 months ago

Thank you @scdunatun . I just updated the repo, but have not tagged a new version. Please can you test the updated code and see if the issue is fixed on your data?

I will try this and report back. Thank you!

scdunatun commented 8 months ago

@jiyunmaths I tested the updated code on the chr1 bamfiles and unfortunately encountered the issue again, starting at logline

[22:16:29]      chr1:208184000
BamRead 200000 processQSize 10000 writeQSize 10000 readPool 72095
image

I let it run for about 5 minutes before I force-killed the execution since the loglines were not changing and the calls file did not have any updates during that time --- the last call written out was for chr1, pos 208182393. Are there any additional debugging options I can turn on or a suggested profiler that would benefit your investigation?

jiyunmaths commented 8 months ago

Hi @scdunatun , sorry to hear that the issue is not fixed. Please can you share the bam files of tumor and normal from chr1 only through Google Drive? My gmail account is jiyunmathbhamATgmail.com. We will look into the code and fix the issue as soon as possible. Thank you.

scdunatun commented 8 months ago

Hi @scdunatun , sorry to hear that the issue is not fixed. Please can you share the bam files of tumor and normal from chr1 only through Google Drive? My gmail account is jiyunmathbhamATgmail.com. We will look into the code and fix the issue as soon as possible. Thank you.

@jiyunmaths I am sending you an email shortly to hash out the data sharing --- thank you so much for your assistance with this! I appreciate it.

jiyunmaths commented 7 months ago

@scdunatun, I just updated the code. It works now on the data shared by you. Please test it again on your side, let me know if it works. Thank you.

scdunatun commented 7 months ago

I will run some tests with subsets of the full data (and if those work, the full data) and report back on how it did. Thank you!

scdunatun commented 7 months ago

@jiyunmaths I ran this through the all-chr bamfile for this patient and it succeeded! Thank you so much. I really appreciate your support and help with getting MuSE working with these bamfiles.

Would you be able to create a tagged version/release for this update?

jiyunmaths commented 7 months ago

Hi @scdunatun, thank you for letting me know your testing result! I just tagged the latest code as v2.0.4. Please take a look.

jiyunmaths commented 7 months ago

This issue is fixed.