uclahs-cds / project-method-AlgorithmEvaluation-BNCH-000082-SRCRNDSeed

GNU General Public License v2.0
1 stars 0 forks source link

Mutect2 + Battenberg + PhyloWGS sr mode Error #94

Closed philsteinberg closed 1 year ago

philsteinberg commented 1 year ago

@lydiayliu Some of the runs have completed and currently are running successfully. However, over night there have been a bunch that failed immediately.

Example failed sample: ILHNLNEV000014-T001-P01-F_628019_Mutect2-Battenberg-PhyloWGS Failed

Example .error (.log is empty): /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-mutect2-battenberg-phylowgs/logs/ILHNLNEV000014-T001-P01-F_628019_Mutect2-Battenberg-PhyloWGS.error

Error message: mktemp: failed to create directory via template ‘/scratch/XXXXXXX’: Permission denied

This is strange because I changed the work_dir in the config from scratch to: /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-mutect2-battenberg-phylowgs/debug

It also confuses me why this is an error for some but not other runs.

lydiayliu commented 1 year ago

My best suggestion right now is to just resubmit and see if the error goes away. The cluster has been slightly unstable lately. It seems like your work_dir change in the config was successful because there is stuff written there.

Maybe @yashpatel6 has seen this error elsewhere?

philsteinberg commented 1 year ago

Update: Running. Re-submitted a bunch and they seem to be running and not erroring with exit code 1 immediately.

yashpatel6 commented 1 year ago

That error is usually caused by an issue with the /scratch disk not connecting properly to the node; there's not much that can be done other than re-submitting

yashpatel6 commented 1 year ago

@philsteinberg do you happen to have the job ID for the one that failed?

philsteinberg commented 1 year ago

@yashpatel6 pretty much most between 13831-13795 on Sat between 10:35am-12:45pm and then previously 13707-13576 Fri between 1am and 5am

philsteinberg commented 1 year ago

Several re-runs are completing.

New Errors for samples:

Error log: /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-mutect2-battenberg-phylowgs/logs/ILHNLNEV000004-T001-P01-F_13142_Mutect2-Battenberg-PhyloWGS.log

Error message:

Error executing process > 'workflow_phylowgs:write_results_PhyloWGS (1)'

Caused by:
  Process `workflow_phylowgs:write_results_PhyloWGS (1)` terminated with an error exit status (1)

Command executed:

  set -euo pipefail

  python2 /phylowgs/write_results.py         --include-ssm-names         ILHNLNEV000004-T001-P01-F         trees.zip         PhyloWGS-2205be1_13142_ILHNLNEV000004-T001-P01-F_Mutect2-Battenberg-summ.json.gz         PhyloWGS-2205be1_13142_ILHNLNEV000004-T001-P01-F_Mutect2-Battenberg-muts.json.gz         PhyloWGS-2205be1_13142_ILHNLNEV000004-T001-P01-F_Mutect2-Battenberg-mutass.json.gz

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/phylowgs/write_results.py", line 58, in <module>
      main()
    File "/phylowgs/write_results.py", line 50, in main
      munger.remove_multiprimary_trees(args.max_multiprimary)
    File "/phylowgs/pwgsresults/result_munger.py", line 96, in remove_multiprimary_trees
      len(self._tree_summaries)
  Exception: 100% of trees are multiprimary (2500 of 2500), so not enough to report good posterior.

Work dir:
  /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-mutect2-battenberg-phylowgs/debug/c6/5c2e3d30916d668ed8f3fcee3fb4f3

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

executor >  local (6)
[a6/90edff] process > run_validate_PipeVal (2)       [100%] 3 of 3 ✔
[0b/0f0d63] process > create_inputs_SRCutil (1)      [100%] 1 of 1 ✔
[-        ] process > workflow_pyclonevi:fit_mode... -
[-        ] process > workflow_pyclonevi:write_re... -
[72/f8a051] process > workflow_phylowgs:call_mult... [100%] 1 of 1 ✔
[c6/5c2e3d] process > workflow_phylowgs:write_res... [100%] 1 of 1, failed: 1 ✘
[-        ] process > workflow_phylowgs:index_dat... -
[-        ] process > workflow_dpclust:generate_i... -
[-        ] process > workflow_dpclust:call_RunDP... -
[-        ] process > workflow_pyclone:run_analys... -
[-        ] process > workflow_fastclone:run_solv... -
Error executing process > 'workflow_phylowgs:write_results_PhyloWGS (1)'

Caused by:
  Process `workflow_phylowgs:write_results_PhyloWGS (1)` terminated with an error exit status (1)

Command executed:

  set -euo pipefail

  python2 /phylowgs/write_results.py         --include-ssm-names         ILHNLNEV000004-T001-P01-F         trees.zip         PhyloWGS-2205be1_13142_ILHNLNEV000004-T001-P01-F_Mutect2-Battenberg-summ.json.gz         PhyloWGS-2205be1_13142_ILHNLNEV000004-T001-P01-F_Mutect2-Battenberg-muts.json.gz         PhyloWGS-2205be1_13142_ILHNLNEV000004-T001-P01-F_Mutect2-Battenberg-mutass.json.gz

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/phylowgs/write_results.py", line 58, in <module>
      main()
    File "/phylowgs/write_results.py", line 50, in main
      munger.remove_multiprimary_trees(args.max_multiprimary)
    File "/phylowgs/pwgsresults/result_munger.py", line 96, in remove_multiprimary_trees
      len(self._tree_summaries)
  Exception: 100% of trees are multiprimary (2500 of 2500), so not enough to report good posterior.

Work dir:
  /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-mutect2-battenberg-phylowgs/debug/c6/5c2e3d30916d668ed8f3fcee3fb4f3

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Since the quality of the posterior distribution depends on the number of MCMC chains, I am guessing that increasing the --num-chains 1 would probably fix this issue. However, this does not make sense in our case because that requires inputting more seeds. Would increasing --mcmc-samples 2500 help?

I also found a a post with a similar issue where the suggestion was to add--allow-polyclonal --include-polyclonal, but I cannot find these flags in the PhyloWGS documentation.

lydiayliu commented 1 year ago

Exception: 100% of trees are multiprimary (2500 of 2500), so not enough to report good posterior.

You are right the crux of the issue is here, basically PhyloWGS is finding solutions of multiple primaries (independent tumours arising in the same organ that are not from the same ancestor clone). When we initially tested PhyloWGS, we found this solution highly unlikely at the rate that PhyloWGS was calling it, and they decided to ban it. Essentially this just means that PhyloWGS failed for this sample + seed.

We can just make a note that it failed, or if you want to be thorough, you can try digging for the appropriate flag in their code and adding it, just so that we save the multiple primary solution (they call it polyclonal, when we say polyconal we mean something else).

philsteinberg commented 1 year ago

Slurm Job_id=17380 Name=ILHNLNEV000011-T001-P01-F_659767_Mutect2-Battenberg-PhyloWGS Failed, Run time 19:59:21, NODE_FAIL, ExitCode 0 - Wed 21/12/2022 @yashpatel6 in case you wanted the data on recent node-fails.

yashpatel6 commented 1 year ago

Slurm Job_id=17380 Name=ILHNLNEV000011-T001-P01-F_659767_Mutect2-Battenberg-PhyloWGS Failed, Run time 19:59:21, NODE_FAIL, ExitCode 0 - Wed 21/12/2022 @yashpatel6 in case you wanted the data on recent node-fails.

Got it, thanks!