uclahs-cds / project-method-AlgorithmEvaluation-BNCH-000082-SRCRNDSeed

GNU General Public License v2.0
1 stars 0 forks source link

Strelka2-Battenberg-DPClust mulit-region mode ERROR #84

Closed philsteinberg closed 1 year ago

philsteinberg commented 1 year ago

Running Strelka2-Battenberg-DPClust single-region mode worked fine. Same issue for SomaticSniper/Mutect2-Battenberg-DPClust-mr. Could this be do to the naming conventions in multi-region mode?

error log: example error log: /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/logs/ILHNLNEV000001_51404_Strelka2-Battenberg-DPClust-mr.log

error message:

Error executing process > 'workflow_dpclust:call_RunDP_DPClust'

Caused by:
  Process `workflow_dpclust:call_RunDP_DPClust` terminated with an error exit status (1)

Command executed:

  set -euo pipefail

      printf "ILHNLNEV000001    ILHNLNEV000001-T002-P02-F   ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt   0.534937500000003   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T001-P01-F   ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt   0.562956521739133   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T004-L02-F   ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt   0.56375 male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T003-L01-F   ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt   0.635411764705883   male    NA  NA\n" > ILHNLNEV000001_project_file_intermediate.txt

      printf "sample\tsubsample\tdatafile\tcellularity\tsex\tcnadatafile\tindeldatafile\n" > ILHNLNEV000001_project_file.txt
      while IFS=$'  ' read -r -a arr
      do
          printf "${arr[0]} ${arr[1]}   `realpath ${arr[2]}`    ${arr[3]}   ${arr[4]}   ${arr[5]}   ${arr[6]}
  " >> ILHNLNEV000001_project_file.txt
      done < ILHNLNEV000001_project_file_intermediate.txt

      Rscript /hot/user/yashpatel/pipeline-call-SRC/pipeline-call-SRC/./module/dpclust_pipeline.R         -d /         -o ./         -r 1         -i ILHNLNEV000001_project_file.txt         -k         --seed 51404 --min_frac_muts_cluster -1

Command exit status:
  1

Command output:
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/scratch/60/ae96b64c3ad517a8160f8da40571eb/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/scratch/ae/497015cb3bb290690a8ba9f817c6ff/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/scratch/b9/21e8fe7d4f465ba1104c559bece089/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/scratch/29/3dbf2e3201fc53fe9510ac109a1137/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  [1] "Loading data..."

Command error:
  acf214f6fcba: Already exists
  5f952968c778: Already exists
  313a98bcd182: Already exists
  0fa8b30a6199: Already exists
  c082b8b01591: Already exists
  177ca9de5b2d: Already exists
  5a3b71fb959c: Already exists
  a7c3f2278282: Already exists
  874ae8d94f7e: Already exists
  11a68d1beffd: Already exists
  e9ad119fed45: Already exists
  0deb46237183: Pulling fs layer
  0deb46237183: Verifying Checksum
  0deb46237183: Download complete
  0deb46237183: Pull complete
  Digest: sha256:586905a6f63308c3691632f5a9b115989be3a06924907c0aa7588f2dc1624423
  Status: Downloaded newer image for ghcr.io/uclahs-cds/docker-dpclust:75f5d7e
  Bioconductor version 3.12 (BiocManager 1.30.10), ?BiocManager::install for help
  Bioconductor version '3.12' is out-of-date; the current release version '3.16'
    is available with R version '4.2'; see https://bioconductor.org/install
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/scratch/60/ae96b64c3ad517a8160f8da40571eb/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/scratch/ae/497015cb3bb290690a8ba9f817c6ff/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/scratch/b9/21e8fe7d4f465ba1104c559bece089/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/scratch/29/3dbf2e3201fc53fe9510ac109a1137/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  The following objects are masked _by_ .GlobalEnv:

      assign_sampled_muts, generate_cluster_ordering, keep_temp_files,
      min_frac_muts_cluster, min_muts_cluster, mut.assignment.type,
      no.iters, no.iters.burn.in, num_muts_sample, species

  The following objects are masked _by_ .GlobalEnv:

      cellularity, datafiles, is.male, mutphasingfiles, samplename,
      subsamples

  The following object is masked _by_ .GlobalEnv:

      seed

  [1] "Loading data..."
  Error in chromosome[, s] <- list_of_tables[[s]][, Chromosome] : 
    number of items to replace is not a multiple of replacement length
  Calls: RunDP -> load.data -> load.data.inner
  Execution halted

Work dir:
  /scratch/b9/0f8b49625c673d4c438927be08cc66

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

executor >  local (18)
[2d/6b22e5] process > run_validate_PipeVal (8)       [100%] 12 of 12 ✔
[3e/e14f5e] process > create_inputs_SRCutil (1)      [100%] 1 of 1 ✔
[-        ] process > workflow_pyclonevi:fit_mode... -
[-        ] process > workflow_pyclonevi:write_re... -
[-        ] process > workflow_phylowgs:call_mult... -
[-        ] process > workflow_phylowgs:write_res... -
[-        ] process > workflow_phylowgs:index_dat... -
[29/3dbf2e] process > workflow_dpclust:generate_i... [100%] 4 of 4 ✔
[b9/0f8b49] process > workflow_dpclust:call_RunDP... [100%] 1 of 1, failed: 1 ✘
[-        ] process > workflow_pyclone:run_analys... -
[-        ] process > workflow_fastclone:run_solv... -
Error executing process > 'workflow_dpclust:call_RunDP_DPClust'

Caused by:
  Process `workflow_dpclust:call_RunDP_DPClust` terminated with an error exit status (1)

Command executed:

  set -euo pipefail

      printf "ILHNLNEV000001    ILHNLNEV000001-T002-P02-F   ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt   0.534937500000003   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T001-P01-F   ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt   0.562956521739133   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T004-L02-F   ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt   0.56375 male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T003-L01-F   ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt   0.635411764705883   male    NA  NA\n" > ILHNLNEV000001_project_file_intermediate.txt

      printf "sample\tsubsample\tdatafile\tcellularity\tsex\tcnadatafile\tindeldatafile\n" > ILHNLNEV000001_project_file.txt
      while IFS=$'  ' read -r -a arr
      do
          printf "${arr[0]} ${arr[1]}   `realpath ${arr[2]}`    ${arr[3]}   ${arr[4]}   ${arr[5]}   ${arr[6]}
  " >> ILHNLNEV000001_project_file.txt
      done < ILHNLNEV000001_project_file_intermediate.txt

      Rscript /hot/user/yashpatel/pipeline-call-SRC/pipeline-call-SRC/./module/dpclust_pipeline.R         -d /         -o ./         -r 1         -i ILHNLNEV000001_project_file.txt         -k         --seed 51404 --min_frac_muts_cluster -1

Command exit status:
  1

Command output:
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/scratch/60/ae96b64c3ad517a8160f8da40571eb/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/scratch/ae/497015cb3bb290690a8ba9f817c6ff/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/scratch/b9/21e8fe7d4f465ba1104c559bece089/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/scratch/29/3dbf2e3201fc53fe9510ac109a1137/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  [1] "Loading data..."

Command error:
  acf214f6fcba: Already exists
  5f952968c778: Already exists
  313a98bcd182: Already exists
  0fa8b30a6199: Already exists
  c082b8b01591: Already exists
  177ca9de5b2d: Already exists
  5a3b71fb959c: Already exists
  a7c3f2278282: Already exists
  874ae8d94f7e: Already exists
  11a68d1beffd: Already exists
  e9ad119fed45: Already exists
  0deb46237183: Pulling fs layer
  0deb46237183: Verifying Checksum
  0deb46237183: Download complete
  0deb46237183: Pull complete
  Digest: sha256:586905a6f63308c3691632f5a9b115989be3a06924907c0aa7588f2dc1624423
  Status: Downloaded newer image for ghcr.io/uclahs-cds/docker-dpclust:75f5d7e
  Bioconductor version 3.12 (BiocManager 1.30.10), ?BiocManager::install for help
  Bioconductor version '3.12' is out-of-date; the current release version '3.16'
    is available with R version '4.2'; see https://bioconductor.org/install
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/scratch/60/ae96b64c3ad517a8160f8da40571eb/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/scratch/ae/497015cb3bb290690a8ba9f817c6ff/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/scratch/b9/21e8fe7d4f465ba1104c559bece089/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/scratch/29/3dbf2e3201fc53fe9510ac109a1137/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  The following objects are masked _by_ .GlobalEnv:

      assign_sampled_muts, generate_cluster_ordering, keep_temp_files,
      min_frac_muts_cluster, min_muts_cluster, mut.assignment.type,
      no.iters, no.iters.burn.in, num_muts_sample, species

  The following objects are masked _by_ .GlobalEnv:

      cellularity, datafiles, is.male, mutphasingfiles, samplename,
      subsamples

  The following object is masked _by_ .GlobalEnv:

      seed

  [1] "Loading data..."
  Error in chromosome[, s] <- list_of_tables[[s]][, Chromosome] : 
    number of items to replace is not a multiple of replacement length
  Calls: RunDP -> load.data -> load.data.inner
  Execution halted

Work dir:
  /scratch/b9/0f8b49625c673d4c438927be08cc66

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
lydiayliu commented 1 year ago

OK I'm just going to pick 1 issue to reply to since all the DPClust multi-region errors are the same. I've seen this before but I don't remember what I did...

Have you tried running multi-region DPClust on the test samples in pipeline-call-SRC?

Could you try using the tip from Yash here: https://github.com/uclahs-cds/tool-SRC-util/issues/39 and set the work_dir to somewhere in /hot/? That way we can look at the allDirichletProcessInfo.txt files and do some debugging from there. It seems like the pipeline dies at the running DPClust step

philsteinberg commented 1 year ago

@lydiayliu The pipeline-call-SRC README paths to the test input are not linked/valid anymore. I only found these test files from #3

ILHNLNEV000001-N002-A01-F Config: /hot/software/pipeline/pipeline-call-SRC/Nextflow/development/unreleased/yashpatel-add-phylowgs/ILHNLNEV000001-N002-A01-F.config

YAML: /hot/software/pipeline/pipeline-call-SRC/Nextflow/development/unreleased/yashpatel-add-phylowgs/ILHNLNEV000001-N002-A01-F.yaml

However, the yaml directory only contains the ILHNLNEV000001-N002-A01-F of the patient, so I cannot use this for multi-region mode.

I changed my seed_51404.config to include a new working directory path:

algorithm: "DPClust"
            options: "--seed 51404 --min_frac_muts_cluster -1" 

dpclust_multisample = true

work_dir = "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/"

Error log pretty much the same: /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/logs/ILHNLNEV000001_51404_Strelka2-Battenberg-DPClust-mr.log


Error executing process > 'workflow_dpclust:call_RunDP_DPClust'

Caused by:
  Process `workflow_dpclust:call_RunDP_DPClust` terminated with an error exit status (1)

Command executed:

  set -euo pipefail

      printf "ILHNLNEV000001    ILHNLNEV000001-T002-P02-F   ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt   0.534937500000003   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T001-P01-F   ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt   0.562956521739133   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T004-L02-F   ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt   0.56375 male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T003-L01-F   ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt   0.635411764705883   male    NA  NA\n" > ILHNLNEV000001_project_file_intermediate.txt

      printf "sample\tsubsample\tdatafile\tcellularity\tsex\tcnadatafile\tindeldatafile\n" > ILHNLNEV000001_project_file.txt
      while IFS=$'  ' read -r -a arr
      do
          printf "${arr[0]} ${arr[1]}   `realpath ${arr[2]}`    ${arr[3]}   ${arr[4]}   ${arr[5]}   ${arr[6]}
  " >> ILHNLNEV000001_project_file.txt
      done < ILHNLNEV000001_project_file_intermediate.txt

      Rscript /hot/user/yashpatel/pipeline-call-SRC/pipeline-call-SRC/./module/dpclust_pipeline.R         -d /         -o ./         -r 1         -i ILHNLNEV000001_project_file.txt         -k         --seed 51404 --min_frac_muts_cluster -1

Command exit status:
  1

Command output:
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/43/5859671f3ac6a40f7fe87518ec839a/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/a1/5a8ce60fb4abb078ed083176db72a7/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/d5/a5b6e28c9820c9366191e33d05000f/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/96/215c615cd3a8fbcc2de4269dbfcf20/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  [1] "Loading data..."

Command error:
  acf214f6fcba: Already exists
  5f952968c778: Already exists
  313a98bcd182: Already exists
  0fa8b30a6199: Already exists
  c082b8b01591: Already exists
  177ca9de5b2d: Already exists
  5a3b71fb959c: Already exists
  a7c3f2278282: Already exists
  874ae8d94f7e: Already exists
  11a68d1beffd: Already exists
  e9ad119fed45: Already exists
  0deb46237183: Pulling fs layer
  0deb46237183: Verifying Checksum
  0deb46237183: Download complete
  0deb46237183: Pull complete
  Digest: sha256:586905a6f63308c3691632f5a9b115989be3a06924907c0aa7588f2dc1624423
  Status: Downloaded newer image for ghcr.io/uclahs-cds/docker-dpclust:75f5d7e
  Bioconductor version 3.12 (BiocManager 1.30.10), ?BiocManager::install for help
  Bioconductor version '3.12' is out-of-date; the current release version '3.16'
    is available with R version '4.2'; see https://bioconductor.org/install
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/43/5859671f3ac6a40f7fe87518ec839a/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/a1/5a8ce60fb4abb078ed083176db72a7/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/d5/a5b6e28c9820c9366191e33d05000f/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/96/215c615cd3a8fbcc2de4269dbfcf20/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  The following objects are masked _by_ .GlobalEnv:

      assign_sampled_muts, generate_cluster_ordering, keep_temp_files,
      min_frac_muts_cluster, min_muts_cluster, mut.assignment.type,
      no.iters, no.iters.burn.in, num_muts_sample, species

  The following objects are masked _by_ .GlobalEnv:

      cellularity, datafiles, is.male, mutphasingfiles, samplename,
      subsamples

  The following object is masked _by_ .GlobalEnv:

      seed

  [1] "Loading data..."
  Error in chromosome[, s] <- list_of_tables[[s]][, Chromosome] : 
    number of items to replace is not a multiple of replacement length
  Calls: RunDP -> load.data -> load.data.inner
  Execution halted

Work dir:
  /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/97/7f760b3dd430526313c5b5c645fcd4

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

executor >  local (18)
[b1/c52933] process > run_validate_PipeVal (11)      [100%] 12 of 12 ✔
[0d/5e58ac] process > create_inputs_SRCutil (1)      [100%] 1 of 1 ✔
[-        ] process > workflow_pyclonevi:fit_mode... -
[-        ] process > workflow_pyclonevi:write_re... -
[-        ] process > workflow_phylowgs:call_mult... -
[-        ] process > workflow_phylowgs:write_res... -
[-        ] process > workflow_phylowgs:index_dat... -
[96/215c61] process > workflow_dpclust:generate_i... [100%] 4 of 4 ✔
[97/7f760b] process > workflow_dpclust:call_RunDP... [100%] 1 of 1, failed: 1 ✘
[-        ] process > workflow_pyclone:run_analys... -
[-        ] process > workflow_fastclone:run_solv... -
Error executing process > 'workflow_dpclust:call_RunDP_DPClust'

Caused by:
  Process `workflow_dpclust:call_RunDP_DPClust` terminated with an error exit status (1)

Command executed:

  set -euo pipefail

      printf "ILHNLNEV000001    ILHNLNEV000001-T002-P02-F   ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt   0.534937500000003   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T001-P01-F   ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt   0.562956521739133   male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T004-L02-F   ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt   0.56375 male    NA  NA
  ILHNLNEV000001    ILHNLNEV000001-T003-L01-F   ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt   0.635411764705883   male    NA  NA\n" > ILHNLNEV000001_project_file_intermediate.txt

      printf "sample\tsubsample\tdatafile\tcellularity\tsex\tcnadatafile\tindeldatafile\n" > ILHNLNEV000001_project_file.txt
      while IFS=$'  ' read -r -a arr
      do
          printf "${arr[0]} ${arr[1]}   `realpath ${arr[2]}`    ${arr[3]}   ${arr[4]}   ${arr[5]}   ${arr[6]}
  " >> ILHNLNEV000001_project_file.txt
      done < ILHNLNEV000001_project_file_intermediate.txt

      Rscript /hot/user/yashpatel/pipeline-call-SRC/pipeline-call-SRC/./module/dpclust_pipeline.R         -d /         -o ./         -r 1         -i ILHNLNEV000001_project_file.txt         -k         --seed 51404 --min_frac_muts_cluster -1

Command exit status:
  1

Command output:
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/43/5859671f3ac6a40f7fe87518ec839a/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/a1/5a8ce60fb4abb078ed083176db72a7/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/d5/a5b6e28c9820c9366191e33d05000f/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/96/215c615cd3a8fbcc2de4269dbfcf20/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  [1] "Loading data..."

Command error:
  acf214f6fcba: Already exists
  5f952968c778: Already exists
  313a98bcd182: Already exists
  0fa8b30a6199: Already exists
  c082b8b01591: Already exists
  177ca9de5b2d: Already exists
  5a3b71fb959c: Already exists
  a7c3f2278282: Already exists
  874ae8d94f7e: Already exists
  11a68d1beffd: Already exists
  e9ad119fed45: Already exists
  0deb46237183: Pulling fs layer
  0deb46237183: Verifying Checksum
  0deb46237183: Download complete
  0deb46237183: Pull complete
  Digest: sha256:586905a6f63308c3691632f5a9b115989be3a06924907c0aa7588f2dc1624423
  Status: Downloaded newer image for ghcr.io/uclahs-cds/docker-dpclust:75f5d7e
  Bioconductor version 3.12 (BiocManager 1.30.10), ?BiocManager::install for help
  Bioconductor version '3.12' is out-of-date; the current release version '3.16'
    is available with R version '4.2'; see https://bioconductor.org/install
  [1] ""
  [1] "Running: ILHNLNEV000001"
  [1] "Working dir: ./"
  [1] "Analysis type: nd_dp"
  [1] "Datafiles:"
  [1] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/43/5859671f3ac6a40f7fe87518ec839a/ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt"
  [2] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/a1/5a8ce60fb4abb078ed083176db72a7/ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt"
  [3] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/d5/a5b6e28c9820c9366191e33d05000f/ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt"
  [4] "/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/96/215c615cd3a8fbcc2de4269dbfcf20/ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt"
  [1] ""
  The following objects are masked _by_ .GlobalEnv:

      assign_sampled_muts, generate_cluster_ordering, keep_temp_files,
      min_frac_muts_cluster, min_muts_cluster, mut.assignment.type,
      no.iters, no.iters.burn.in, num_muts_sample, species

  The following objects are masked _by_ .GlobalEnv:

      cellularity, datafiles, is.male, mutphasingfiles, samplename,
      subsamples

  The following object is masked _by_ .GlobalEnv:

      seed

  [1] "Loading data..."
  Error in chromosome[, s] <- list_of_tables[[s]][, Chromosome] : 
    number of items to replace is not a multiple of replacement length
  Calls: RunDP -> load.data -> load.data.inner
  Execution halted

Work dir:
  /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/97/7f760b3dd430526313c5b5c645fcd4

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
lydiayliu commented 1 year ago

Try the test sample here: https://github.com/uclahs-cds/pipeline-call-SRC/pull/11, it seemed to have worked with HATCHET and ms MuTect2.

Right so now you see in your error log that the "Work dir" is the debug directory you set (in your old logs it should be something in /scratch/

Work dir:
  /hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/97/7f760b3dd430526313c5b5c645fcd4

Now you can go to the directory and see the temporary files

yiyangliu@ip-0A12520B:/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/97/7f760b3dd430526313c5b5c645fcd4$ ls
ILHNLNEV000001_DPoutput_2000iters_1000burnin_seed51404  ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt
ILHNLNEV000001_project_file_intermediate.txt            ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt
ILHNLNEV000001_project_file.txt                         ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt
ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt

If you do ls -a, you can see the hidden log files as well, the one you want is .command.log. This is handy for meta-pipelines where the first layer of errors are hidden. Not so useful now since what we want are the intermediate allDirichletProcessInfo.txt files that are causing the issue, but this is good to know.

Ok so I think the problem is pretty clear, a quick wc -l revelas that each of the allDirichletProcessInfo.txt files have differing number of lines. This doesn't work for DPClust since dpclust requires in multi-region SRC that all samples have the SAME set of SNVs.

yiyangliu@ip-0A12520B:/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug/97/7f760b3dd430526313c5b5c645fcd4$ wc -l *Info.txt
   4999 ILHNLNEV000001-T001-P01-F_allDirichletProcessInfo.txt
   4886 ILHNLNEV000001-T002-P02-F_allDirichletProcessInfo.txt
   4896 ILHNLNEV000001-T003-L01-F_allDirichletProcessInfo.txt
   4768 ILHNLNEV000001-T004-L02-F_allDirichletProcessInfo.txt
  19549 total

The allDirichletProcessInfo.txt is not something that tool-SRC-util produces but is the direct product of the loci.txt and alleleCounts.txt files that tool-SRC-util produces. For multi-region DPClust SNV pre-processing, for now only SNVs that are called in all samples can be used. I'm surprised that I never noticed this... (since PyClone-VI also requires SNVs to be shared between all samples, did we not just use the same system for DPClust?) Please open an issue in that repo and fix this for DPClust (and ideally double check for PyClone-VI). I think you can handle it from here!

Btw you can also dig around your debug folder for more fun stuff, since all the intermediate files from the run can be found here. For example, you can clearly see that the intermediate loci.txt files produces have differing number of lines between the samples:

yiyangliu@ip-0A12520B:/hot/project/method/AlgorithmEvaluation/BNCH-000082-SRCRNDSeed/pipeline-call-src/run-strelka2-battenberg-dpclust/debug$ wc -l  */*/*loci.txt
  4885 43/5859671f3ac6a40f7fe87518ec839a/ILHNLNEV000001-T002-P02-F-loci.txt
  4895 96/215c615cd3a8fbcc2de4269dbfcf20/ILHNLNEV000001-T003-L01-F-loci.txt
  4998 a1/5a8ce60fb4abb078ed083176db72a7/ILHNLNEV000001-T001-P01-F-loci.txt
  4767 d5/a5b6e28c9820c9366191e33d05000f/ILHNLNEV000001-T004-L02-F-loci.txt
 19545 total