hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
188 stars 58 forks source link

sv-prep read processing fails when encountering DRAGEN supplementary alignments #333

Closed scwatts closed 1 year ago

scwatts commented 1 year ago

Error:

<leading log text removed for brevity>

11:30:48.913 [main] [INFO ] processing chromosome(chr10)
Exception in thread "Thread-9" java.lang.ArrayIndexOutOfBoundsException: Index 49 out of bounds for length 49
    at com.hartwig.hmftools.svprep.reads.ReadFilters.checkFilters(ReadFilters.java:93)
    at com.hartwig.hmftools.svprep.reads.PartitionSlicer.processSamRecord(PartitionSlicer.java:186)
    at com.hartwig.hmftools.common.samtools.BamSlicer.slice(BamSlicer.java:65)
    at com.hartwig.hmftools.svprep.reads.PartitionSlicer.run(PartitionSlicer.java:115)
    at com.hartwig.hmftools.svprep.reads.PartitionThread.run(PartitionThread.java:75)

<trailing log text removed for brevity>

Command:

java \
  -Xmx14g \
  -jar software/sv-prep_v1.0.jar \
    -sample seqc_tumor \
    -bam_file data/sample/seqc_tumor.dragen.sliced.bam \
    -ref_genome ./data/reference/hg38.fa \
    -ref_genome_version 38 \
    -blacklist_bed ./data/reference/sv_prep_blacklist.38.bed \
    -known_fusion_bed ./data/reference/known_fusions.38.bedpe \
    -write_types 'JUNCTIONS;BAM;FRAGMENT_LENGTH_DIST' \
    -threads 1 \
    -output_dir output/1_incl_supplementary/

error seen with sv-prep v1.0 and sv-prep compiled from source at 64d3bfe.

Input data:

chr10 87773108
chr10 87887149
chr10 87954021
chr10 88010561
chr17 59785545
chr17 59785670
Reproducible example (click to show)
Attachment: [seqc_tumor.dragen.sliced.tar.gz](https://github.com/hartwigmedical/hmftools/files/9743011/seqc_tumor.dragen.sliced.tar.gz) Commands: > assumes required reference files are placed under `./data/reference/` ```bash # Set up env mamba create -p $(pwd -P)/conda_env/ -y openjdk samtools conda activate conda_env/ mkdir -p software/ wget -P software/ 'https://github.com/hartwigmedical/hmftools/releases/download/sv-prep-v1.0/sv-prep_v1.0.jar' # Define fn for execution run_svprep() { bam_fp=${1}; output_dir=${2%/}; mkdir -p ${output_dir}; sample_name=$(sed 's/\..*//' <<< ${bam_fp##*/}); java 1>${output_dir}/${sample_name}_log.txt 2>&1 \ -Xmx14g \ -jar software/sv-prep_v1.0.jar \ -sample ${sample_name} \ -bam_file ${bam_fp} \ -ref_genome ./data/reference/hg38.fa \ -ref_genome_version 38 \ -blacklist_bed ./data/reference/sv_prep_blacklist.38.bed \ -known_fusion_bed ./data/reference/known_fusions.38.bedpe \ -write_types 'JUNCTIONS;BAM;FRAGMENT_LENGTH_DIST' \ -threads 1 \ -output_dir ${output_dir}/; } # Get input data mkdir -p data/sample/ curl -Ls https://github.com/hartwigmedical/hmftools/files/9743011/seqc_tumor.dragen.sliced.tar.gz | tar -xzvf - -C data/sample/ # Run sv-prep with supplementary alignments (fails) run_svprep data/sample/seqc_tumor.dragen.sliced.bam output/1_incl_supplementary/ # Filter supplementary alignments mkdir -p output/2_excl_supplementary/ samtools view \ -F2048 \ -o output/2_excl_supplementary/seqc_tumor.dragen.sliced.nosupp.bam \ data/sample/seqc_tumor.dragen.sliced.bam samtools index output/2_excl_supplementary/seqc_tumor.dragen.sliced.nosupp.bam # Run sv-prep without supplementary alignments (succeeds) run_svprep output/2_excl_supplementary/seqc_tumor.dragen.sliced.nosupp.bam output/2_excl_supplementary/ ```
charlesshale commented 1 year ago

I've just added support for hard-clipped reads to SvPrep:

https://github.com/hartwigmedical/hmftools/releases/tag/sv-prep-v1.1

I can produce results from that mini BAM you attached. Could you try this beta release?

thanks

scwatts commented 1 year ago

Thank you for the fix - SV Prep 1.1_beta works with the mini BAM posted above as well as several other complete BAMs that I tested.