populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

BAM to CRAM conversion stage #733

Closed EddieLF closed 1 month ago

EddieLF commented 1 month ago

A stage to convert BAMs to CRAMs and write them to the bucket under pacbio/cram. Utilises the existing bam_to_cram job.

Also adds a new default config for the long read workflow - configs/defaults/seqr_loader_long_read.toml. This config includes a specific analysis type for outputs from the BamToCram stage:

[workflow]
bam_to_cram_analysis_type = 'pacbio_cram'

This analysis type will be picked up by the stage and created analyses will be stored under this type. Without this config option, the analysis type will be cram.
The output path with be gs://cpg-dataset-main/pacbio/cram/CPGxxxxx.cram, as well as an adjacent cram index.

The jobs can be seen running to completion and creating outputs for some test data here: Driver job BamToCram jobs

EddieLF commented 1 month ago

Thanks both!

@cassimons I have actioned your suggestions and moved the stage into the cpg_workflows/stages/seqr_loader_long_read/ directory, as well as added the workflow. bam_to_cram_analysis_type config field to defaults.toml, commented out but with a comment explaining its purpose.