populationgenomics / production-pipelines

Genomics workflows for CPG using Hail Batch
MIT License
2 stars 0 forks source link

New stage for long-read sequencing data: BamToCram #732

Open EddieLF opened 1 month ago

EddieLF commented 1 month ago

Adding a new stage and jobs for a pipeline to be used by the rare-disease team.

The stage will create CRAM files from BAMs, not by re-aligning the BAMs, but by converting them.

How this stage should work:

  1. Create cohort(s) containing sequencing groups with long-read data.
  2. Read sequencing groups into a production-pipelines workflow from these input_cohorts.
  3. If the sequencing_group.assay.reads_type == "bam", then create a job to convert the BAM into a CRAM using Samtools.
  4. Save this CRAM in the main bucket, preferably in a date stamped bucket path for long read CRAMs.
  5. Create an entry for each sequencing group successfully converted to CRAM under the analysis type CRAM.
EddieLF commented 1 month ago

https://github.com/populationgenomics/production-pipelines/pull/733