vladsavelyev / warp

WDL Analysis Research Pipelines
https://broadinstitute.github.io/warp
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

CRAMs don't work as inputs to WGSFromBam #7

Open lgruen opened 3 years ago

lgruen commented 3 years ago

pipelines/broad/dna_seq/germline/single_sample/wgs/WGSFromBam.wdl has an input_cram parameter, but trying to use a CRAM input results in this error:

Exception in thread "main" htsjdk.samtools.cram.CRAMException: Contig chr1 not found in the reference file.
        at htsjdk.samtools.CRAMIterator.nextContainer(CRAMIterator.java:174)
        at htsjdk.samtools.CRAMIterator.hasNext(CRAMIterator.java:259)
        at htsjdk.samtools.SamReader$AssertingIterator.hasNext(SamReader.java:591)
        at picard.sam.SortSam.doWork(SortSam.java:160)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

As a workaround, converting to BAM with samtools first works, but it adds yet another somewhat redundant step before sorting and then converting to fastq.

It's also likely that Processing.SortSamByName would currently run out of disk space with a CRAM input:

  Float sort_sam_disk_multiplier = 3.25
  Int disk_size = ceil(sort_sam_disk_multiplier * size(input_bam, "GiB")) + 20

(E.g. a CRAM of 37 GB resulted in a sorted BAM of 117 GB, which wouldn't fit.)

lgruen commented 3 years ago

To reproduce, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/illumina_platinum_pedigree/data/CEU/NA12878/alignment/NA12878.alt_bwamem_GRCh38DH.20150706.CEU.illumina_platinum_ped.cram should work.

kbergin commented 3 years ago

It may be helpful to mention in the pipelines/broad/reprocessing folder we have a collection of WDLs that take in aligned CRAMs, revert, and then call the single sample genome/exome pipeline. CramToUnmappedBam is available as a top level workflow as well: https://github.com/populationgenomics/warp/tree/main/pipelines/broad/reprocessing/cram_to_unmapped_bams

Also hi, I'm Kylee Degatano, I'm the product manager for the WARP repository and I'm happy to be of help if you ever have questions! My email is kdegatano@broadinstitute.org. We are working on formalizing our contributions guidelines, but we welcome contributions you think may be helpful to the larger community and issues if you find anything wrong or have feature requests!

vladsavelyev commented 3 years ago

@kbergin, that's awesome, thanks so much for the tips and pointing to those WDLs! Going to try it out.

And nice to meet you! :)