hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
188 stars 58 forks source link

Isofox: supplementary reads causing java.lang.NullPointerException #340

Closed scwatts closed 1 year ago

scwatts commented 1 year ago

Error

java.lang.NullPointerException
    at com.hartwig.hmftools.isofox.fusion.SupplementaryJunctionData.fromReads(SupplementaryJunctionData.java:84)
    at com.hartwig.hmftools.isofox.fusion.SupplementaryJunctionData.cacheSupplementaryJunctionCandidate(SupplementaryJunctionData.java:109)
    at com.hartwig.hmftools.isofox.fusion.ChimericReadTracker.cacheSupplementaryJunctionCandidate(ChimericReadTracker.java:590)
    at com.hartwig.hmftools.isofox.fusion.ChimericReadTracker.postProcessChimericReads(ChimericReadTracker.java:306)
    at com.hartwig.hmftools.isofox.BamFragmentAllocator.produceBamCounts(BamFragmentAllocator.java:234)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.analyseBamReads(ChromosomeTaskExecutor.java:323)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.assignTranscriptCounts(ChromosomeTaskExecutor.java:233)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.call(ChromosomeTaskExecutor.java:154)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.call(ChromosomeTaskExecutor.java:60)
    at com.hartwig.hmftools.common.utils.TaskExecutor.executeTasks(TaskExecutor.java:28)
    at com.hartwig.hmftools.isofox.Isofox.allocateBamFragments(Isofox.java:205)
    at com.hartwig.hmftools.isofox.Isofox.runAnalysis(Isofox.java:147)
    at com.hartwig.hmftools.isofox.Isofox.main(Isofox.java:491)

Command

java \
  -jar ./software/isofox_v1.5_rc3.jar \
    -sample PTC_NebRNA210629 \
    -functions 'TRANSCRIPT_COUNTS;ALT_SPLICE_JUNCTIONS;FUSIONS' \
    -bam_file sample_data/L2100706.bam \
    -ref_genome reference_data/hg38.fa \
    -ensembl_data_dir reference_data/ensembl_data_cache/ \
    -exp_counts_file reference_data/read_151_exp_counts.csv \
    -exp_gc_ratios_file reference_data/read_100_exp_gc_ratios.csv \
    -threads 4 \
    -output_dir ./output/

Other notes


Reproducible example (click to show)
Attachment: [L2100706.sliced.bam.gz](https://github.com/hartwigmedical/hmftools/files/9975937/L2100706.sliced.bam.gz) > The above BAM contains WTS reads from a SEQC sample aligned using DRAGEN, sliced to chr1:1020000-1080000. The BAM has been gzipped to allow upload to GH. Obtain sample data and software ```bash mkdir -p sample_data/ curl -L https://github.com/hartwigmedical/hmftools/files/9975937/L2100706.sliced.bam.gz | gzip -cd > sample_data/L2100706.sliced.bam samtools index sample_data/L2100706.sliced.bam mkdir -p software/ wget -P software/ 'https://github.com/hartwigmedical/hmftools/releases/download/isofox-v1.5/isofox_v1.5_rc3.jar' ``` Run Isofox on sample data (fails/triggers null pointer exception) ```bash mkdir -p output/ java \ -jar ./software/isofox_v1.5_rc3.jar \ -sample PTC_NebRNA210629 \ -functions 'TRANSCRIPT_COUNTS;FUSIONS' \ -bam_file sample_data/L2100706.sliced.bam \ -ref_genome reference_data/hg38.fa \ -ensembl_data_dir reference_data/ensembl_data_cache/ \ -exp_counts_file reference_data/read_151_exp_counts.csv \ -exp_gc_ratios_file reference_data/read_100_exp_gc_ratios.csv \ -specific_regions chr1:1020000:1080000 \ -threads 1 \ -output_dir ./output/ ``` Run Isofox on sample data without supplementary reads (succeeds) ```bash mkdir -p output_nosupps/ samtools view -F2048 -o sample_data/L2100706.sliced.nosupps.bam sample_data/L2100706.sliced.bam samtools index sample_data/L2100706.sliced.nosupps.bam java \ -jar ./software/isofox_v1.5_rc3.jar \ -sample PTC_NebRNA210629 \ -functions 'TRANSCRIPT_COUNTS;FUSIONS' \ -bam_file sample_data/L2100706.sliced.nosupps.bam \ -ref_genome reference_data/hg38.fa \ -ensembl_data_dir reference_data/ensembl_data_cache/ \ -exp_counts_file reference_data/read_151_exp_counts.csv \ -exp_gc_ratios_file reference_data/read_100_exp_gc_ratios.csv \ -specific_regions chr1:1020000:1080000 \ -threads 1 \ -output_dir ./output_nosupps/ ```
charlesshale commented 1 year ago

Isofox does not currently handle reads with hard-clips but I'm changing it now to do so - will make a release in a few days time, and will provide output for that BAM.

thanks.

scwatts commented 1 year ago

The Isofox v1.6_beta release resolves issues observed above. With your fix I'm now seeing that the same error is being caused by different chimeric/supplementary alignments:

Caused by: java.lang.NullPointerException
    at com.hartwig.hmftools.isofox.fusion.SupplementaryJunctionData.fromReads(SupplementaryJunctionData.java:72)
    at com.hartwig.hmftools.isofox.fusion.SupplementaryJunctionData.cacheSupplementaryJunctionCandidate(SupplementaryJunctionData.java:115)
    at com.hartwig.hmftools.isofox.fusion.ChimericReadTracker.cacheSupplementaryJunctionCandidate(ChimericReadTracker.java:600)
    at com.hartwig.hmftools.isofox.fusion.ChimericReadTracker.postProcessChimericReads(ChimericReadTracker.java:316)
    at com.hartwig.hmftools.isofox.BamFragmentAllocator.produceBamCounts(BamFragmentAllocator.java:235)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.analyseBamReads(ChromosomeTaskExecutor.java:318)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.assignTranscriptCounts(ChromosomeTaskExecutor.java:228)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.call(ChromosomeTaskExecutor.java:149)
    at com.hartwig.hmftools.isofox.ChromosomeTaskExecutor.call(ChromosomeTaskExecutor.java:55)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

Example commands and test data are attached below.

Reproducible example (click to show)
Attachments: * [L2100706.sliced.bam.gz](https://github.com/hartwigmedical/hmftools/files/10026793/L2100706.sliced.bam.gz) * [L2100706.sliced.nochimeric.bam.gz](https://github.com/hartwigmedical/hmftools/files/10026790/L2100706.sliced.nochimeric.bam.gz) * [L2100706.sliced.chimeric.bam.gz](https://github.com/hartwigmedical/hmftools/files/10026792/L2100706.sliced.chimeric.bam.gz) ```bash url_base='https://github.com/hartwigmedical/hmftools/files' url_files=' 10026790/L2100706.sliced.nochimeric.bam.gz 10026792/L2100706.sliced.chimeric.bam.gz 10026793/L2100706.sliced.bam.gz ' mkdir -p data/ for url_file in ${url_files}; do file=$(sed 's/.gz//' <<< data/${url_file##*/}); curl -Ls "${url_base}/${url_file}" | gzip -cd > ${file}; samtools index ${file}; done regions=" chr1:103000000-104000000 chr10:63000000-64000000 chr16:89500000-89700000 chr17:76500000-76600000 " for file in data/*bam; do sample=$(sed -e 's/^[^.]*\.//' -e 's/\.bam//' <<< ${file##*/}); output_dir=output/${sample}; mkdir -p ${output_dir}; java \ -jar ./software/isofox_v1.6_beta.jar \ -sample PTC_NebRNA210629 \ -functions 'TRANSCRIPT_COUNTS;ALT_SPLICE_JUNCTIONS;FUSIONS' \ -bam_file ${file} \ -ref_genome reference_data/hg38.fa \ -ref_genome_version 38 \ -ensembl_data_dir reference_data/ensembl_data_cache/ \ -exp_counts_file reference_data/read_151_exp_counts.csv \ -exp_gc_ratios_file reference_data/read_100_exp_gc_ratios.csv \ -threads 8 \ -specific_regions "$(sed -e '/^$/d' <<< ${regions} | paste -sd';' -)" \ -output_dir ${output_dir} 2>&1 | tee ${output_dir}/log.txt; done ```
scwatts commented 1 year ago

As discussed over email, DRAGEN alignments are unlikely to be suitable inputs for Isofox and STAR alignments should be used instead. Closing as the error is no longer relevant.