Closed rlancaster96 closed 3 months ago
Issue has been isolated to processing with bcftools norm. FilterMutectCalls works with files that are concatenated and sorted with bcftools prior to processing with bcftools norm, and fails after.
the broad institute tool last listed in the java stack trace is here:
https://github.com/broadinstitute/gatk/blob/master/src/main/java/org/broadinstitute/hellbender/tools/walkers/mutect/filtering/StrandArtifactFilter.java
org.broadinstitute.hellbender.tools.walkers.mutect.filtering.StrandArtifactFilter.lambda$calculateArtifactProbabilities$6(StrandArtifactFilter.java:74)
the specific line is this:
return IntStream.range(0, altSBs.size()).mapToObj(i -> {
final List<Integer> altSB = altSBs.get(i);
final int altIndelSize = indelSizes.get(i); <<<<< line 74
this is the context:
public List<EStep> calculateArtifactProbabilities(final VariantContext vc, final Mutect2FilteringEngine filteringEngine) {
// for each allele, forward and reverse count
List<List<Integer>> sbs = StrandBiasUtils.getSBsForAlleles(vc);
if (sbs == null || sbs.isEmpty() || sbs.size() <= 1) {
return Collections.emptyList();
}
// remove symbolic alleles
if (vc.hasSymbolicAlleles()) {
sbs = GATKVariantContextUtils.removeDataForSymbolicAlleles(vc, sbs);
}
final List<Integer> indelSizes = vc.getAlternateAlleles().stream().map(alt -> Math.abs(vc.getReference().length() - alt.length())).collect(Collectors.toList());
int totalFwd = sbs.stream().map(sb -> sb.get(0)).mapToInt(i -> i).sum();
int totalRev = sbs.stream().map(sb -> sb.get(1)).mapToInt(i -> i).sum();
// skip the reference
final List<List<Integer>> altSBs = sbs.subList(1, sbs.size());
return IntStream.range(0, altSBs.size()).mapToObj(i -> {
final List<Integer> altSB = altSBs.get(i);
final int altIndelSize = indelSizes.get(i);
if (altSB.stream().mapToInt(Integer::intValue).sum() == 0 || altIndelSize > LONGEST_STRAND_ARTIFACT_INDEL_SIZE) {
return new EStep(0, 0, totalFwd, totalRev, altSB.get(0), altSB.get(1));
} else {
return strandArtifactProbability(strandArtifactPrior, totalFwd, totalRev, altSB.get(0), altSB.get(1), altIndelSize);
}
}).collect(Collectors.toList());
}
resolved with https://github.com/ohsu-cedar-comp-hub/WGS-nextflow-workflow/pull/58
Expected behavior: With previous test files (small, 800-line sliced fastq), behaves as expected and FilterMutectCalls completes successfully. With large fastq files or full-size files, FilterMutectCalls exits with the error below:
Command run:
Workflow:
Tool:
Error:
A similar but not identical issue was posted here for the related command Mutect2: https://github.com/broadinstitute/gatk/issues/4578#issuecomment-616557568
Attempts to fix the issue:
path mutect_idx_fai
andpath mutect_dict
argscreate index
argument set to false as per suggestion for similar haplotypecaller issue here https://gatk.broadinstitute.org/hc/en-us/community/posts/16527065134107-HaplotypeCaller-java-lang-ArrayIndexOutOfBoundsException-Index-32770-out-of-bounds-for-length-32770ob priors
argumentEnvironment: Nextflow version 24.04.2 singularity-ce version 3.8.0-1.el7 GATK 4.4.0.0 Branch: https://github.com/ohsu-cedar-comp-hub/WGS-nextflow-workflow/pull/58