wustl-oncology / analysis-wdls

Scalable genomic analysis pipelines, written in WDL
MIT License
5 stars 11 forks source link

sambamba-sort error: not enough data in stream #120

Open malachig opened 1 year ago

malachig commented 1 year ago

Recently encountered a case that hit this error.

sambamba-sort: Error reading BGZF block starting from offset 0: stream error: not enough data in stream

In this step: call-somaticExome -> somaticExome -> call-normalAlignment -> sequenceToBqsr -> call-markDuplicatesAndSort

Notes:

Even though the input BAM in this case is actually a smallish normal sample BAM, its possible that the space required relative to its size is large and we are running out of disk space?

Disk space needed is currently calculated as follows: https://github.com/wustl-oncology/analysis-wdls/blob/5c745330ef128404f0b94e819813c28eaf727193/definitions/tools/mark_duplicates_and_sort.wdl#LL[…]C7

We could try increasing the multiplier (add cost). Or maybe just increase the base amount:

e.g. From Int space_needed_gb = 10 + round(5*size(bam, "GB")) To Int space_needed_gb = 20 + round(5*size(bam, "GB"))

malachig commented 1 year ago

This increase seems to have worked. Suggest we create a PR with this minor increase.