uclahs-cds / metapipeline-DNA

Nextflow pipeline to convert BAM to FASTQ, align, perform QC, assess targeted coverage, call gSNP, call sSNV, call mtSNV, call SVs, call sCNA, and perform subclonal reconstruction
GNU General Public License v2.0
4 stars 0 forks source link

A USER ERROR has occurred: Failed to read bam header from WGS\:WTSI\:30177-1.sorted.bam #43

Open Alfredo-Enrique opened 2 years ago

Alfredo-Enrique commented 2 years ago

Describe the issue Align-DNA error when running metapipeline-DNA when testing with a WGS bam file. Erro seems to be during the Mark Duplicates function. Basing this just on the error which refers to a sorted bam. Full error at the end of error report. Excerpt here.

    A USER ERROR has occurred: Failed to read bam header from WGS\:WTSI\:30177-1.sorted.bam
     Caused by:java.net.URISyntaxException: Relative path in absolute URI: WGS\:WTSI%5C:30177-1.sorted.bam

To Reproduce Steps to reproduce the behavior:

  1. You have to use this specific repo specified above. Either copy or cd into repo
  2. Launch from there or different location. Just modified lead config to save to a different location for better record keeping and debuggin.

Expected behavior Works through bam2fastq without issue but fails at align-DNA step after sorting bam.

Screenshots If applicable, add screenshots to help explain your problem.

Excert of .command.log:

N E X T F L O W  ~  version 21.10.5
Launching `/hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/modules/metapipeline_DNA.nf` [mad_laplace] - revision: 320992d8e9
[-        ] process > convert_BAM2FASTQ:extract_r... -
[-        ] process > convert_BAM2FASTQ:create_in... -
[-        ] process > convert_BAM2FASTQ:call_conv... -
[-        ] process > convert_BAM2FASTQ:create_in... -
[-        ] process > align_DNA:call_align_DNA       -
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

[-        ] process > convert_BAM2FASTQ:extract_r... [  0%] 0 of 2
[-        ] process > convert_BAM2FASTQ:create_in... [  0%] 0 of 2
[-        ] process > convert_BAM2FASTQ:call_conv... -
[-        ] process > convert_BAM2FASTQ:create_in... -
[-        ] process > align_DNA:call_align_DNA       -
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

executor >  local (6)
[b8/1acd6d] process > convert_BAM2FASTQ:extract_r... [  0%] 0 of 2
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[09/1be5bc] process > convert_BAM2FASTQ:call_conv... [  0%] 0 of 2
[-        ] process > convert_BAM2FASTQ:create_in... -
[-        ] process > align_DNA:call_align_DNA       -
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

executor >  local (6)
[65/b2fada] process > convert_BAM2FASTQ:extract_r... [ 50%] 1 of 2
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[09/1be5bc] process > convert_BAM2FASTQ:call_conv... [  0%] 0 of 2
[-        ] process > convert_BAM2FASTQ:create_in... -
[-        ] process > align_DNA:call_align_DNA       -
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

executor >  local (6)
[b8/1acd6d] process > convert_BAM2FASTQ:extract_r... [100%] 2 of 2 ✔
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[09/1be5bc] process > convert_BAM2FASTQ:call_conv... [  0%] 0 of 2
[-        ] process > convert_BAM2FASTQ:create_in... -
[-        ] process > align_DNA:call_align_DNA       -
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

executor >  local (7)
[b8/1acd6d] process > convert_BAM2FASTQ:extract_r... [100%] 2 of 2 ✔
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[39/35e657] process > convert_BAM2FASTQ:call_conv... [ 50%] 1 of 2
[f2/266606] process > convert_BAM2FASTQ:create_in... [  0%] 0 of 1
[-        ] process > align_DNA:call_align_DNA       -
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

executor >  local (7)
[b8/1acd6d] process > convert_BAM2FASTQ:extract_r... [100%] 2 of 2 ✔
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[39/35e657] process > convert_BAM2FASTQ:call_conv... [ 50%] 1 of 2
[f2/266606] process > convert_BAM2FASTQ:create_in... [100%] 1 of 1
[-        ] process > align_DNA:call_align_DNA       [  0%] 0 of 1
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

executor >  local (8)
[b8/1acd6d] process > convert_BAM2FASTQ:extract_r... [100%] 2 of 2 ✔
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[09/1be5bc] process > convert_BAM2FASTQ:call_conv... [100%] 2 of 2 ✔
[f2/266606] process > convert_BAM2FASTQ:create_in... [ 50%] 1 of 2
[0f/fd4e79] process > align_DNA:call_align_DNA (1)   [  0%] 0 of 1
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -

executor >  local (9)
[b8/1acd6d] process > convert_BAM2FASTQ:extract_r... [100%] 2 of 2 ✔
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[09/1be5bc] process > convert_BAM2FASTQ:call_conv... [100%] 2 of 2 ✔
[55/4fc6ca] process > convert_BAM2FASTQ:create_in... [ 50%] 1 of 2
[0f/fd4e79] process > align_DNA:call_align_DNA (1)   [  0%] 0 of 1
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -
Error executing process > 'align_DNA:call_align_DNA (1)'

Caused by:
  Process `align_DNA:call_align_DNA (1)` terminated with an error exit status (1)

Command executed:

  nextflow run         /hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/modules/align_DNA/../../external/pipeline-align-DNA/pipeline/align-DNA.nf         --sample_name SP116384         --aligner BWA-MEM2          --enable_spark true --mark_duplicates true --reference_fasta_bwa /hot/ref/tool-specific-input/BWA-MEM2-2.2.1/GRCh38-BI-20160721/index/genome.fa         --output_dir $(pwd)         --work_dir /scratch         --input_csv align_DNA_input.csv         -c /hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/modules/align_DNA/default.config

Command exit status:
  1

Command output:
    22/07/25 03:47:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(nobody); groups with view permissions: Set(); users  with modify permissions: Set(nobody); groups with modify permissions: Set()
    22/07/25 03:47:38 INFO Utils: Successfully started service 'sparkDriver' on port 45754.
    22/07/25 03:47:38 INFO SparkEnv: Registering MapOutputTracker
    22/07/25 03:47:38 INFO SparkEnv: Registering BlockManagerMaster
    22/07/25 03:47:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    22/07/25 03:47:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    22/07/25 03:47:38 INFO DiskBlockManager: Created local directory at /spark_temp_dir/blockmgr-af8dfba1-2bff-4780-9075-da3b96f85309
    22/07/25 03:47:38 INFO MemoryStore: MemoryStore started with capacity 4.5 GB
    22/07/25 03:47:38 INFO SparkEnv: Registering OutputCommitCoordinator
    22/07/25 03:47:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    22/07/25 03:47:38 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://6f8b5512d6d7:4040
    22/07/25 03:47:38 INFO Executor: Starting executor ID driver on host localhost
    22/07/25 03:47:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44650.
    22/07/25 03:47:38 INFO NettyBlockTransferService: Server created on 6f8b5512d6d7:44650
    22/07/25 03:47:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    22/07/25 03:47:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    22/07/25 03:47:38 INFO BlockManagerMasterEndpoint: Registering block manager 6f8b5512d6d7:44650 with 4.5 GB RAM, BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    22/07/25 03:47:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    22/07/25 03:47:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    03:47:39.174 INFO  MarkDuplicatesSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
    22/07/25 03:47:39 INFO SparkUI: Stopped Spark web UI at http://6f8b5512d6d7:4040
    22/07/25 03:47:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    22/07/25 03:47:39 INFO MemoryStore: MemoryStore cleared
    22/07/25 03:47:39 INFO BlockManager: BlockManager stopped
    22/07/25 03:47:39 INFO BlockManagerMaster: BlockManagerMaster stopped
    22/07/25 03:47:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    22/07/25 03:47:39 INFO SparkContext: Successfully stopped SparkContext
    03:47:39.203 INFO  MarkDuplicatesSpark - Shutting down engine
    [July 25, 2022 3:47:39 AM GMT] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=751828992
    ***********************************************************************

    A USER ERROR has occurred: Failed to read bam header from WGS\:WTSI\:30177-1.sorted.bam
     Caused by:java.net.URISyntaxException: Relative path in absolute URI: WGS\:WTSI%5C:30177-1.sorted.bam

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    22/07/25 03:47:39 INFO ShutdownHookManager: Shutdown hook called
    22/07/25 03:47:39 INFO ShutdownHookManager: Deleting directory /spark_temp_dir/spark-e8f34259-40b1-4593-a29b-c93dcc1d182e
    Using GATK jar /gatk/gatk-package-4.2.4.1-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Djava.io.tmpdir=/temp_dir -jar /gatk/gatk-package-4.2.4.1-local.jar MarkDuplicatesSpark --read-validation-stringency LENIENT --input WGS\:WTSI\:30177-1.sorted.bam --input WGS\:WTSI\:30177-2.sorted.bam --input WGS\:WTSI\:30177-0.sorted.bam --output SP116384.bam --metrics-file SP116384.mark_dup.metrics --program-name MarkDuplicatesSpark --create-output-bam-index --conf spark.executor.cores=${task.cpus} --conf spark.local.dir=/spark_temp_dir --tmp-dir /temp_dir

  Work dir:
    /scratch/2e/7444f2e999df74e9752d9ce3f5d6a8

  Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Command wrapper:
    22/07/25 03:47:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(nobody); groups with view permissions: Set(); users  with modify permissions: Set(nobody); groups with modify permissions: Set()
    22/07/25 03:47:38 INFO Utils: Successfully started service 'sparkDriver' on port 45754.
    22/07/25 03:47:38 INFO SparkEnv: Registering MapOutputTracker
    22/07/25 03:47:38 INFO SparkEnv: Registering BlockManagerMaster
    22/07/25 03:47:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
    22/07/25 03:47:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
    22/07/25 03:47:38 INFO DiskBlockManager: Created local directory at /spark_temp_dir/blockmgr-af8dfba1-2bff-4780-9075-da3b96f85309
    22/07/25 03:47:38 INFO MemoryStore: MemoryStore started with capacity 4.5 GB
    22/07/25 03:47:38 INFO SparkEnv: Registering OutputCommitCoordinator
    22/07/25 03:47:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    22/07/25 03:47:38 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://6f8b5512d6d7:4040
    22/07/25 03:47:38 INFO Executor: Starting executor ID driver on host localhost
    22/07/25 03:47:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44650.
    22/07/25 03:47:38 INFO NettyBlockTransferService: Server created on 6f8b5512d6d7:44650
    22/07/25 03:47:38 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
    22/07/25 03:47:38 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    22/07/25 03:47:38 INFO BlockManagerMasterEndpoint: Registering block manager 6f8b5512d6d7:44650 with 4.5 GB RAM, BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    22/07/25 03:47:38 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    22/07/25 03:47:38 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 6f8b5512d6d7, 44650, None)
    03:47:39.174 INFO  MarkDuplicatesSpark - Spark verbosity set to INFO (see --spark-verbosity argument)
    22/07/25 03:47:39 INFO SparkUI: Stopped Spark web UI at http://6f8b5512d6d7:4040
    22/07/25 03:47:39 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
    22/07/25 03:47:39 INFO MemoryStore: MemoryStore cleared
    22/07/25 03:47:39 INFO BlockManager: BlockManager stopped
    22/07/25 03:47:39 INFO BlockManagerMaster: BlockManagerMaster stopped
    22/07/25 03:47:39 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
    22/07/25 03:47:39 INFO SparkContext: Successfully stopped SparkContext
    03:47:39.203 INFO  MarkDuplicatesSpark - Shutting down engine
    [July 25, 2022 3:47:39 AM GMT] org.broadinstitute.hellbender.tools.spark.transforms.markduplicates.MarkDuplicatesSpark done. Elapsed time: 0.02 minutes.
    Runtime.totalMemory()=751828992
    ***********************************************************************

    A USER ERROR has occurred: Failed to read bam header from WGS\:WTSI\:30177-1.sorted.bam
     Caused by:java.net.URISyntaxException: Relative path in absolute URI: WGS\:WTSI%5C:30177-1.sorted.bam

    ***********************************************************************
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.
    22/07/25 03:47:39 INFO ShutdownHookManager: Shutdown hook called
    22/07/25 03:47:39 INFO ShutdownHookManager: Deleting directory /spark_temp_dir/spark-e8f34259-40b1-4593-a29b-c93dcc1d182e
    Using GATK jar /gatk/gatk-package-4.2.4.1-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Djava.io.tmpdir=/temp_dir -jar /gatk/gatk-package-4.2.4.1-local.jar MarkDuplicatesSpark --read-validation-stringency LENIENT --input WGS\:WTSI\:30177-1.sorted.bam --input WGS\:WTSI\:30177-2.sorted.bam --input WGS\:WTSI\:30177-0.sorted.bam --output SP116384.bam --metrics-file SP116384.mark_dup.metrics --program-name MarkDuplicatesSpark --create-output-bam-index --conf spark.executor.cores=${task.cpus} --conf spark.local.dir=/spark_temp_dir --tmp-dir /temp_dir

  Work dir:
    /scratch/2e/7444f2e999df74e9752d9ce3f5d6a8

  Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Work dir:
  /scratch/0f/fd4e7956981966a9fd529c46e159a5

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

WARN: Killing pending tasks (1)

executor >  local (9)
[b8/1acd6d] process > convert_BAM2FASTQ:extract_r... [100%] 2 of 2 ✔
[1c/c877cf] process > convert_BAM2FASTQ:create_in... [100%] 2 of 2 ✔
[09/1be5bc] process > convert_BAM2FASTQ:call_conv... [100%] 2 of 2 ✔
[55/4fc6ca] process > convert_BAM2FASTQ:create_in... [100%] 1 of 1
[0f/fd4e79] process > align_DNA:call_align_DNA (1)   [100%] 1 of 1, failed: 1
[-        ] process > call_gSNP:create_normal_tum... -
[-        ] process > call_gSNP:create_input_csv_... -
[-        ] process > call_gSNP:call_call_gSNP       -
[-        ] process > call_sSNV                      -
[-        ] process > call_mtSNV:create_input_csv    -
[-        ] process > call_mtSNV:call_call_mtSNV     -
Error executing process > 'align_DNA:call_align_DNA (1)'

Caused by:
  Process `align_DNA:call_align_DNA (1)` terminated with an error exit status (1)

Command executed:

  nextflow run         /hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/modules/align_DNA/../../external/pipeline-align-DNA/pipeline/align-DNA.nf         --sample_name SP116384         --aligner BWA-MEM2          --enable_spark true --mark_duplicates true --reference_fasta_bwa /hot/ref/tool-specific-input/BWA-MEM2-2.2.1/GRCh38-BI-20160721/index/genome.fa         --output_dir $(pwd)         --work_dir /scratch         --input_csv align_DNA_input.csv         -c /hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/modules/align_DNA/default.config

Command exit status:
  1
yashpatel6 commented 2 years ago

This is caused by the filenames containing : characters, which are ambiguous to interpret with the library used by Spark. They originate from the LB tag in the BAM read group header. This is something we'll want to fix in align-DNA + all other pipelines with either standardized filenames and/or a function to clean up special characters from filenames.

Alfredo-Enrique commented 2 years ago

This is caused by the filenames containing : characters, which are ambiguous to interpret with the library used by Spark. They originate from the LB tag in the BAM read group header. This is something we'll want to fix in align-DNA + all other pipelines with either standardized filenames and/or a function to clean up special characters from filenames.

Good catch @yashpatel6! I'm guessing that this will be more of an issue with files generated externally outside of UCLA then with internal files. Funnily enough @tyamaguchi-ucla was just mentioning that the file names in the command looked weird. Thanks for figuring out where they were coming from.

yashpatel6 commented 2 years ago

Correct, I actually opened a PR for a Nextflow module for sanitizing strings, especially for filenames. This is something that'll need to be rolled out to the individual pipelines so I'm going to plan for the fix to passively be in the v2.0.0 release for the metapipeline as the individual pipelines get updated.

Alfredo-Enrique commented 2 years ago

Correct, I actually opened a PR for a Nextflow module for sanitizing strings, especially for filenames. This is something that'll need to be rolled out to the individual pipelines so I'm going to plan for the fix to passively be in the v2.0.0 release for the metapipeline as the individual pipelines get updated.

Hmm... can I just go ahead and do a janky-imperfect version of this specific use case? Cause I don't think I cam move on with my project otherwise.

yashpatel6 commented 2 years ago

Hmm... can I just go ahead and do a janky-imperfect version of this specific use case? Cause I don't think I cam move on with my project otherwise.

I have a few suggestions if you need to process the samples ASAP:

  1. You could update the BAM headers to replace the problematic : characters as a quick fix (ex. @RG ... LB:WGS:WTSI:30177 ... -> @RG ... LB:WGS-WTSI-30177 ...).
  2. Another option is to run BAM2FASTQ separately and then run the metapipeline with FASTQ input with the library identifier appropriately set in the input CSV/YAML.
  3. Create a branch of the metapipeline and patch align-DNA on that branch.

I would lean towards the third option since it's less manual work and doesn't require modifying the raw input files.

Alfredo-Enrique commented 2 years ago

Thank-you for the wonderful suggestions @yashpatel6. I'll either do 2 or 3. I have to pivot to a different task for the next two day but will revisit this at the end of the week. Thank-you for the help and suggestions!