The primary motivation for these changes is to address an underlying bug in bazam, the tool we previously used for converting CRAM files to FASTQ format for re-alignment. The bug causes the SAM flag relating to read pair orientation to be overwritten for correctly oriented read pairs on the reverse strand. This is critical because it impacts the accurate representation of read pair orientations in re-aligned CRAM files. Notably, this issue does not affect the individual read strand flag, which remains correctly preserved.
Changes Introduced
Removal of bazam for CRAM to FASTQ Conversion:
Replaced the use of bazam with direct extraction of FASTQ files using samtools, ensuring the integrity of SAM flags.
Simplified Alignment Job Workflow:
Refactored _get_alignment_input to handle CRAM input directly when specified in the configuration.
Consolidated and streamlined the logic for sharding and aligning jobs, removing the complexity associated with bazam.
Adjusted the _align_one function to handle direct FASTQ extraction and alignment using the selected aligner.
Improved Read Group Handling:
Updated read group handling and commands to ensure consistency and correctness in the aligned outputs.
Enhanced Logging and Error Handling:
Added informative logging to track the progress and actions taken during the alignment process.
Improved error handling to provide clearer messages when alignment inputs are missing or incorrect.
Context and Motivation
The primary motivation for these changes is to address an underlying bug in
bazam
, the tool we previously used for converting CRAM files to FASTQ format for re-alignment. The bug causes the SAM flag relating to read pair orientation to be overwritten for correctly oriented read pairs on the reverse strand. This is critical because it impacts the accurate representation of read pair orientations in re-aligned CRAM files. Notably, this issue does not affect the individual read strand flag, which remains correctly preserved.Changes Introduced
_get_alignment_input
to handle CRAM input directly when specified in the configuration.bazam
._align_one
function to handle direct FASTQ extraction and alignment using the selected aligner.