luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
299 stars 37 forks source link

False Negative de novo - trio ? #207

Closed husamia closed 2 years ago

husamia commented 2 years ago

Describe the bug I detected a de novo variant in trio with a different caller. I wanted to see if there are more or if Octopus can pick it up. So I ran Octopus on the Trio data. It's 30X WGS illumina. The proband, mom and dad. Octopus detected 0 variants. It looks to me that this denovo is real but maybe Octopus missed it? see IGV screenshot.

Version

$ octopus --version
octopus version 0.7.4 (develop 2ec5790e)
Target: x86_64 Linux 5.4.72-microsoft-standard-WSL2
SIMD extension: AVX2
Compiler: GNU 11.2.0
Boost: 1_76

Command line to run octopus:

$ octopus -R /mnt/g/Novogene-CDAR-WGS/raw_data/Homo_sapiens_assembly38.fa -I XBRG.cram XBRG-mom.cram XBRG-dad.cram -M XBRG-mom -F XBRG-dad --threads 1 -o XBRG_dnovo.octopus.vcf.gz --denovos-only

Additional context output

[2021-09-06 10:38:02] <INFO> ------------------------------------------------------------------------
[2021-09-06 10:38:02] <INFO> octopus v0.7.4 (develop 2ec5790e)
[2021-09-06 10:38:02] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-09-06 10:38:02] <INFO> ------------------------------------------------------------------------
[2021-09-07 07:15:50] <WARN> Could not estimate read size from data, resorting to default
[2021-09-07 07:15:50] <INFO> Done initialising calling components in 20h 37m
[2021-09-07 07:15:50] <INFO> Detected 3 samples: "XBRG" "XBRG-dad" "XBRG-mom"
[2021-09-07 07:15:50] <INFO> Invoked calling model: trio
[2021-09-07 07:15:50] <INFO> Processing 3,217,346,917bp with a single thread (40 cores detected)
[2021-09-07 07:15:50] <INFO> Writing filtered calls to "/mnt/g/XBRG/XBRG_dnovo.octopus.vcf.gz"
[2021-09-07 07:15:50] <INFO> -------------------------------------------------------------------------------------
[2021-09-07 07:15:50] <INFO>            current             |                   |     time      |     estimated
[2021-09-07 07:15:50] <INFO>            position            |     completed     |     taken     |     ttc
[2021-09-07 07:15:50] <INFO> -------------------------------------------------------------------------------------
[2021-09-07 13:53:53] <INFO>                            -             100%           6h 38m                 -
[2021-09-07 13:53:53] <INFO> Starting Call Set Refinement (CSR) filtering
[2021-09-07 13:53:53] <INFO> -------------------------------------------------------------------------------------
[2021-09-07 13:53:53] <INFO>            current             |                   |     time      |     estimated
[2021-09-07 13:53:53] <INFO>            position            |     completed     |     taken     |     ttc
[2021-09-07 13:53:53] <INFO> -------------------------------------------------------------------------------------
[2021-09-07 13:53:53] <INFO>                            -             100%             25ms                 -
[2021-09-07 13:53:54] <INFO> Finished calling 3,217,346,917bp, total runtime 6h 38m
[2021-09-07 13:53:54] <INFO> Calls have been written to "/mnt/g/XBRG/XBRG_dnovo.octopus.vcf.gz"
[2021-09-07 13:53:54] <INFO> Removed 2 temporary files
[2021-09-07 13:53:54] <INFO> ------------------------------------------------------------------------

image

The pileup data looks like this variant is de novo

image

dancooke commented 2 years ago

Looks like there's at least one read supporting the variant allele in the mom, so maybe the variant was called germline. If you remove the --denovos-only option then you check. Actually, I'd recommend against using the --denovos-only option for this reason - I'm considering removing it in future versions.

husamia commented 2 years ago

well only one read in mom. The VAF in son is 54% which is pretty high.

I am interested in germline and somatic de novo.

Son image

Mom image

husamia commented 2 years ago

I will run without the --denovo-only option and find out.

husamia commented 2 years ago

Maybe this is an exceptional case. I am really interested in de novo only variants. Don't remove it. I will be testing it with other types of data. Maybe add some additional options?

dancooke commented 2 years ago

Note that you still get de novo variants without the --denovos-only - you get labelled germline and de novo variants (de novo variants are tagged with DENOVO in the INFO column).

dancooke commented 2 years ago

There's another INFO annotation PP that is the probability of the called variant classification (germline or de novo), so you will probably find the your example variant is called germline but has a low PP.

husamia commented 2 years ago

Can you provide the command to post filter the result VCF for the de novo only candidates? I will report the filtered result

dancooke commented 2 years ago

bcftools view -i 'DENOVO=1' trio.vcf.gz > denovo.vcf

husamia commented 2 years ago

Looks like there's at least one read supporting the variant allele in the mom, so maybe the variant was called germline. If you remove the --denovos-only option then you check. Actually, I'd recommend against using the --denovos-only option for this reason - I'm considering removing it in future versions.

I rerun it without the --denovos-only option. The result is the same. 0 variants. Why is there no variants detected this time?

here is the log and the header

[2021-09-13 10:10:12] <INFO> ------------------------------------------------------------------------ [2021-09-13 10:10:12] <INFO> octopus v0.7.4 (develop 2ec5790e) [2021-09-13 10:10:12] <INFO> Copyright (c) 2015-2021 University of Oxford [2021-09-13 10:10:12] <INFO> ------------------------------------------------------------------------ [2021-09-14 09:53:55] <WARN> Could not estimate read size from data, resorting to default [2021-09-14 09:53:55] <INFO> Done initialising calling components in 23h 43m [2021-09-14 09:53:55] <INFO> Detected 3 samples: "XBRG" "XBRG-dad" "XBRG-mom" [2021-09-14 09:53:55] <INFO> Invoked calling model: trio [2021-09-14 09:53:55] <INFO> Processing 3,217,346,917bp with a single thread (40 cores detected) [2021-09-14 09:53:55] <INFO> Writing filtered calls to "XBRG_dnovo-2.octopus.vcf.gz" [2021-09-14 09:53:55] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 09:53:55] <INFO> current | | time | estimated [2021-09-14 09:53:55] <INFO> position | completed | taken | ttc [2021-09-14 09:53:55] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 16:21:58] <INFO> - 100% 6h 28m - [2021-09-14 16:21:58] <INFO> Starting Call Set Refinement (CSR) filtering [2021-09-14 16:21:58] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 16:21:58] <INFO> current | | time | estimated [2021-09-14 16:21:58] <INFO> position | completed | taken | ttc [2021-09-14 16:21:58] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 16:21:58] <INFO> - 100% 21ms - [2021-09-14 16:21:58] <INFO> Finished calling 3,217,346,917bp, total runtime 6h 28m [2021-09-14 16:21:58] <INFO> Calls have been written to "XBRG_dnovo-2.octopus.vcf.gz" [2021-09-14 16:21:58] <INFO> Removed 2 temporary files [2021-09-14 16:21:58] <INFO> ------------------------------------------------------------------------

##octopus=<version=0.7.4_develop_2ec5790e,command="octopus -R /mnt/g/Homo_sapiens_assembly38.fa -I XBRG.cram XBRG-mom.cram XBRG-dad.cram -M XBRG-mom -F XBRG-dad --threads 1 -o XBRG_dnovo-2.octopus.vcf
.gz",options="--aggregate-annotations no --allow-cycles no --allow-marked-duplicates no --allow-octopus-duplicates no --allow-qc-fails no --allow-secondary-alignments no --allow-strand-biased-candidates no --allow-supplementary
-alignments no --assemble-all no --assembler-mask-base-quality 10 --backtrack-level NONE --bad-region-tolerance NORMAL --bamout-type MINI --caller population --clone-concentration 1 --clone-prior 0.01 --consider-unmapped-reads
no --contig-output-order REFERENCE_INDEX --contig-ploidies[0] Y=1 --contig-ploidies[1] chrY=1 --contig-ploidies[2] MT=1 --contig-ploidies[3] chrM=1 --denovo-filter-expression QUAL < 50 | PP < 40 | GQ < 20 | MQ < 30 | AD < 1 | A
F < 0.1 | AFB > 0.2 | SB > 0.95 | BQ < 20 | DP < 10 | ADP < 1 | DC > 1 | MF > 0.2 | FRF > 0.5 | MP < 30 | MQ0 > 2 --denovo-indel-prior 1e-09 --denovo-snv-prior 1.3e-08 --denovos-only no --disable-adapter-masking no --disable-as
sembly-candidate-generator no --disable-call-filtering no --disable-denovo-variant-discovery no --disable-downsampling no --disable-inactive-flank-scoring no --disable-overlap-masking no --disable-pileup-candidate-generator no
--disable-read-preprocessing no --disable-repeat-candidate-generator no --dont-model-mapping-quality no --dont-protect-reference-haplotype no --downsample-above 1000 --downsample-target 500 --dropout-concentration 5 --duplicate
-read-detection-policy RELAXED --extension-level MODERATE --fallback-kmer-gap 10 --fast no --filter-expression QUAL < 10 | MQ < 10 | MP < 10 | AD < 1 | AF < 0.01 | AFB > 0.25 | SB > 0.98 | BQ < 15 | DP < 1 | ADP < 1 --force-pil
eup-candidates no --good-base-quality 20 --haplotype-holdout-threshold 2500 --haplotype-overflow 200000 --ignore-unmapped-contigs no --indel-heterozygosity 0.0001 --keep-temporary-files no --keep-unfiltered-calls no --kmer-size
s[0] 10 --kmer-sizes[1] 15 --kmer-sizes[2] 20 --lagging-level MODERATE --mask-3prime-shifted-soft-clipped-heads no --mask-inverted-soft-clipping no --mask-soft-clipped-bases no --mask-soft-clipped-boundary-bases 2 --maternal-sa
mple XBRG-mom --max-assembly-region-overlap 200 --max-assembly-region-size 600 --max-bubbles 30 --max-clones 4 --max-copy-gain 0 --max-copy-loss 0 --max-decoy-supplementary-alignment-mapping-quality 5 --max-fallback-kmers 10 --
max-haplotypes 200 --max-holdout-depth 20 --max-indel-errors 16 --max-open-read-files 250 --max-read-length 10000 --max-reference-cache-memory 500MB --max-somatic-haplotypes 2 --max-unlocalized-supplementary-alignment-mapping-q
uality 5 --max-unplaced-supplementary-alignment-mapping-quality 5 --max-variant-size 2000 --max-vb-seeds 12 --min-bubble-score 2 --min-candidate-credible-vaf-probability 0.75 --min-clone-frequency 0.01 --min-credible-somatic-fr
equency 0.005 --min-denovo-posterior 3 --min-expected-somatic-frequency 0.01 --min-forest-quality 3 --min-good-bases 20 --min-kmer-prune 2 --min-mapping-quality 5 --min-phase-score 5 --min-pileup-base-quality 20 --min-protected
-haplotype-posterior 1e-10 --min-refcall-posterior 0 --min-somatic-posterior 0.5 --min-variant-posterior 0.1 --model-posterior SPECIAL --no-adapter-contaminated-reads no --no-reads-with-distant-segments no --no-reads-with-unmap
ped-segments no --normal-contamination-risk LOW --one-based-indexing no --organism-ploidy 2 --output XBRG_dnovo-2.octopus.vcf.gz --paternal-sample XBRG-dad --phasing-policy AUTO --phylogeny-concentration 20 --read-linkage PAIRE
D --reads[0] XBRG.cram --reads[1] XBRG-mom.cram --reads[2] XBRG-dad.cram --refcall-block-merge-quality 10 --refcall-filter-expression QUAL < 2 | GQ < 20 | MQ < 10 | DP < 10 | MF > 0.2 --reference Homo_sapiens_assembly38.fa --re
solve-symlinks no --sequence-error-model PCR-free.HiSeq-2500 --sites-only no --snp-heterozygosity 0.001 --snp-heterozygosity-stdev 0.01 --somatic-cnv-prior 1e-05 --somatic-credible-mass 0.9 --somatic-filter-expression QUAL < 2
| GQ < 20 | MQ < 30 | SMQ < 40 | SB > 0.9 | SD > 0.9 | BQ < 20 | DP < 3 | ADP < 1 | MF > 0.2 | NC > 1 | FRF > 0.5 | AD < 1 | AF < 0.0001 --somatic-indel-prior 1e-06 --somatic-snv-prior 0.0001 --somatics-only no --split-long-rea
ds no --target-read-buffer-memory 6GB --temp-directory-prefix octopus-temp --threads 1 --tumour-germline-concentration 1.5 --use-filtered-source-candidates no --use-germline-forest-for-somatic-normals no --use-independent-genot
ype-priors no --use-preprocessed-reads-for-filtering no --use-same-read-profile-for-all-samples no --use-uniform-genotype-priors no --use-wide-hmm-scores no --variant-discovery-mode ILLUMINA --very-fast no",date="2021-09-14 09:
53:55.188451700 EDT">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  XBRG    XBRG-dad        XBRG-mom
dancooke commented 2 years ago

Can you run:

$ octopus -R /mnt/g/Homo_sapiens_assembly38.fa -I XBRG.cram XBRG-mom.cram XBRG-dad.cram -M XBRG-mom -F XBRG-dad -T chr14:78,714,000-78,716,000 -o XBRG_dnovo-2.octopus.vcf.gz --debug

and post the resulting octopus_debug.log?

husamia commented 2 years ago

here is the full octopus_debug.log

this is illumina 30X WGS paired. There is warning message about the read size. Strange 0 in the reads

[2021-09-15 10:00:08] <INFO> ------------------------------------------------------------------------
[2021-09-15 10:00:08] <INFO> octopus v0.7.4 (develop 2ec5790e)
[2021-09-15 10:00:08] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-09-15 10:00:08] <INFO> ------------------------------------------------------------------------
[2021-09-15 10:00:08] <DEBG> Program options: 
[2021-09-15 10:00:08] <DEBG> > aggregate-annotations=no
> allow-cycles=no
> allow-marked-duplicates=no
> allow-octopus-duplicates=no
> allow-qc-fails=no
> allow-secondary-alignments=no
> allow-strand-biased-candidates=no
> allow-supplementary-alignments=no
> assemble-all=no
> assembler-mask-base-quality=10
> backtrack-level=NONE
> bad-region-tolerance=NORMAL
> bamout-type=MINI
> caller=population
> clone-concentration=1
> clone-prior=0.01
> consider-unmapped-reads=no
> contig-output-order=REFERENCE_INDEX
> contig-ploidies[0]=Y=1
> contig-ploidies[1]=chrY=1
> contig-ploidies[2]=MT=1
> contig-ploidies[3]=chrM=1
~ debug=octopus_debug.log
> denovo-filter-expression=QUAL < 50 | PP < 40 | GQ < 20 | MQ < 30 | AD < 1 | AF < 0.1 | AFB > 0.2 | SB > 0.95 | BQ < 20 | DP < 10 | ADP < 1 | DC > 1 | MF > 0.2 | FRF > 0.5 | MP < 30 | MQ0 > 2
> denovo-indel-prior=1e-09
> denovo-snv-prior=1.3e-08
> denovos-only=no
> disable-adapter-masking=no
> disable-assembly-candidate-generator=no
> disable-call-filtering=no
> disable-denovo-variant-discovery=no
> disable-downsampling=no
> disable-inactive-flank-scoring=no
> disable-overlap-masking=no
> disable-pileup-candidate-generator=no
> disable-read-preprocessing=no
> disable-repeat-candidate-generator=no
> dont-model-mapping-quality=no
> dont-protect-reference-haplotype=no
> downsample-above=1000
> downsample-target=500
> dropout-concentration=5
> duplicate-read-detection-policy=RELAXED
> extension-level=MODERATE
> fallback-kmer-gap=10
> fast=no
> filter-expression=QUAL < 10 | MQ < 10 | MP < 10 | AD < 1 | AF < 0.01 | AFB > 0.25 | SB > 0.98 | BQ < 15 | DP < 1 | ADP < 1
> force-pileup-candidates=no
> good-base-quality=20
> haplotype-holdout-threshold=2500
> haplotype-overflow=200000
> ignore-unmapped-contigs=no
> indel-heterozygosity=0.0001
> keep-temporary-files=no
> keep-unfiltered-calls=no
> kmer-sizes[0]=10
> kmer-sizes[1]=15
> kmer-sizes[2]=20
> lagging-level=MODERATE
> mask-3prime-shifted-soft-clipped-heads=no
> mask-inverted-soft-clipping=no
> mask-soft-clipped-bases=no
> mask-soft-clipped-boundary-bases=2
~ maternal-sample=XBRG-mom
> max-assembly-region-overlap=200
> max-assembly-region-size=600
> max-bubbles=30
> max-clones=4
> max-copy-gain=0
> max-copy-loss=0
> max-decoy-supplementary-alignment-mapping-quality=5
> max-fallback-kmers=10
> max-haplotypes=200
> max-holdout-depth=20
> max-indel-errors=16
> max-open-read-files=250
> max-read-length=10000
> max-reference-cache-memory=500MB
> max-somatic-haplotypes=2
> max-unlocalized-supplementary-alignment-mapping-quality=5
> max-unplaced-supplementary-alignment-mapping-quality=5
> max-variant-size=2000
> max-vb-seeds=12
> min-bubble-score=2
> min-candidate-credible-vaf-probability=0.75
> min-clone-frequency=0.01
> min-credible-somatic-frequency=0.005
> min-denovo-posterior=3
> min-expected-somatic-frequency=0.01
> min-forest-quality=3
> min-good-bases=20
> min-kmer-prune=2
> min-mapping-quality=5
> min-phase-score=5
> min-pileup-base-quality=20
> min-protected-haplotype-posterior=1e-10
> min-refcall-posterior=0
> min-somatic-posterior=0.5
> min-variant-posterior=0.1
> model-posterior=SPECIAL
> no-adapter-contaminated-reads=no
> no-reads-with-distant-segments=no
> no-reads-with-unmapped-segments=no
> normal-contamination-risk=LOW
> one-based-indexing=no
> organism-ploidy=2
~ output=XBRG_dnovo-3.octopus.vcf.gz
~ paternal-sample=XBRG-dad
> phasing-policy=AUTO
> phylogeny-concentration=20
> read-linkage=PAIRED
~ reads[0]=XBRG.cram
~ reads[1]=XBRG-mom.cram
~ reads[2]=XBRG-dad.cram
> refcall-block-merge-quality=10
> refcall-filter-expression=QUAL < 2 | GQ < 20 | MQ < 10 | DP < 10 | MF > 0.2
~ reference=Homo_sapiens_assembly38.fa
~ regions[0]=chr14:78,714,000-78,716,000
> resolve-symlinks=no
> sequence-error-model=PCR-free.HiSeq-2500
> sites-only=no
> snp-heterozygosity=0.001
> snp-heterozygosity-stdev=0.01
> somatic-cnv-prior=1e-05
> somatic-credible-mass=0.9
> somatic-filter-expression=QUAL < 2 | GQ < 20 | MQ < 30 | SMQ < 40 | SB > 0.9 | SD > 0.9 | BQ < 20 | DP < 3 | ADP < 1 | MF > 0.2 | NC > 1 | FRF > 0.5 | AD < 1 | AF < 0.0001
> somatic-indel-prior=1e-06
> somatic-snv-prior=0.0001
> somatics-only=no
> split-long-reads=no
> target-read-buffer-memory=6GB
> temp-directory-prefix=octopus-temp
~ threads=2
> tumour-germline-concentration=1.5
> use-filtered-source-candidates=no
> use-germline-forest-for-somatic-normals=no
> use-independent-genotype-priors=no
> use-preprocessed-reads-for-filtering=no
> use-same-read-profile-for-all-samples=no
> use-uniform-genotype-priors=no
> use-wide-hmm-scores=no
> variant-discovery-mode=ILLUMINA
> very-fast=no
[2021-09-15 12:46:54] <WARN> Could not estimate read size from data, resorting to default
[2021-09-15 12:46:54] <DEBG> Estimated read memory footprint is 714B
[2021-09-15 12:46:54] <INFO> Done initialising calling components in 2h 46m
[2021-09-15 12:46:54] <INFO> Detected 3 samples: "XBRG" "XBRG-dad" "XBRG-mom"
[2021-09-15 12:46:54] <INFO> Invoked calling model: trio
[2021-09-15 12:46:54] <INFO> Processing 2,000bp with 2 threads (40 cores detected)
[2021-09-15 12:46:54] <INFO> Writing filtered calls to "/mnt/g/XBRG/XBRG_dnovo-3.octopus.vcf.gz"
[2021-09-15 12:46:54] <DEBG> Command line: octopus -R /mnt/g/Novogene-CDAR-WGS/raw_data/Homo_sapiens_assembly38.fa -I XBRG.cram XBRG-mom.cram XBRG-dad.cram -M XBRG-mom -F XBRG-dad -T chr14:78,714,000-78,716,000 --threads 2 -o XBRG_dnovo-3.octopus.vcf.gz --debug
[2021-09-15 12:46:54] <DEBG> All input regions:
    Contig chr14
    chr14:78714000-78716000 
[2021-09-15 12:46:54] <WARN> Running in parallel mode can make debug log difficult to interpret
[2021-09-15 12:46:54] <DEBG> Making tasks for 1 contigs
[2021-09-15 12:46:54] <DEBG> Making tasks for contig chr14
[2021-09-15 12:46:55] <DEBG> Finished making tasks for contig chr14
[2021-09-15 12:46:55] <DEBG> Finished making tasks
[2021-09-15 12:46:55] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:46:55] <INFO>      current      |                   |     time      |     estimated   
[2021-09-15 12:46:55] <INFO>      position     |     completed     |     taken     |     ttc         
[2021-09-15 12:46:55] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:46:55] <DEBG> Finished calling contig chr14
[2021-09-15 12:46:55] <DEBG> Spawning task chr14:78714000-78716000
[2021-09-15 12:46:55] <DEBG> There are 1 idle futures
[2021-09-15 12:46:55] <DEBG> Finished making new tasks. Waiting for task writer to complete existing jobs
[2021-09-15 12:46:55] <DEBG> Waiting for 2 running tasks to finish
[2021-09-15 12:46:55] <DEBG> Task writer finished
[2021-09-15 12:49:06] <DEBG> Fetched 0 unfiltered reads from chr14:78713900-78716100
[2021-09-15 12:49:06] <DEBG> In sample XBRG-dad
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnlocalizedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnplacedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoDecoySupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedQcFail filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSupplementaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsLong filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotOctopusDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasSufficientGoodQualityBases filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsGoodMappingQuality filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsMapped filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasWellFormedCigar filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSecondaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsShort filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasValidBaseQualities filter
[2021-09-15 12:49:06] <DEBG> In sample XBRG-mom
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnlocalizedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnplacedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoDecoySupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedQcFail filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSupplementaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsLong filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotOctopusDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasSufficientGoodQualityBases filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsGoodMappingQuality filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsMapped filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasWellFormedCigar filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSecondaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsShort filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasValidBaseQualities filter
[2021-09-15 12:49:06] <DEBG> In sample XBRG
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnlocalizedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnplacedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoDecoySupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedQcFail filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSupplementaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsLong filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotOctopusDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasSufficientGoodQualityBases filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsGoodMappingQuality filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsMapped filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasWellFormedCigar filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSecondaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsShort filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasValidBaseQualities filter
[2021-09-15 12:49:06] <DEBG> There are 0 reads in chr14:78713900-78716100 after filtering
[2021-09-15 12:49:06] <DEBG> Downsampling removed 0 reads from chr14:78713900-78716100
[2021-09-15 12:49:06] <DEBG> Stopping early as no reads found in call region chr14:78714000-78716000
[2021-09-15 12:49:06] <DEBG> Writing completed task chr14:78714000-78716000 that finished in 2m 11s
[2021-09-15 12:49:06] <INFO>               -             100%           2m 11s                 -
[2021-09-15 12:49:06] <DEBG> Merging 1 temporary VCF files
[2021-09-15 12:49:06] <INFO> Starting Call Set Refinement (CSR) filtering
[2021-09-15 12:49:06] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:49:06] <INFO>      current      |                   |     time      |     estimated   
[2021-09-15 12:49:06] <INFO>      position     |     completed     |     taken     |     ttc         
[2021-09-15 12:49:06] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:49:06] <INFO>               -             100%              8ms                 -
[2021-09-15 12:49:06] <INFO> Finished calling 2,000bp, total runtime 2m 12s
[2021-09-15 12:49:06] <INFO> Calls have been written to "/mnt/g/XBRG/XBRG_dnovo-3.octopus.vcf.gz"
[2021-09-15 12:49:07] <INFO> Removed 4 temporary files
[2021-09-15 12:49:07] <INFO> ------------------------------------------------------------------------
dancooke commented 2 years ago

Yeah that's odd, any chance you can subset one of the CRAMs in that region with samtools and send it to me?

husamia commented 2 years ago

XBRG.zip

here is the proband subset cram. I accidentally closed it.

husamia commented 2 years ago

I don't know the aligner that was used. There was no header info available.

dancooke commented 2 years ago

Don't see any problems on my end:

$ octopus -R hs38DH.fa -I XBRG_subset.cram -T chr14:78,714,000-78,716,000 -o test.vcf
$ bcftools view -H test.vcf
chr14   78714703    .   A   T   889.15  .   AC=2;AN=2;DP=34;MQ=60;NS=1  GT:GQ:DP:MQ:PS:PQ   1|1:216:34:60:78714703:100
chr14   78714883    .   C   T   431.88  .   AC=1;AN=2;DP=48;MQ=60;NS=1  GT:GQ:DP:MQ:PS:PQ   0|1:431:48:60:78714703:100
chr14   78715097    .   G   A   404.49  .   AC=1;AN=2;DP=35;MQ=60;NS=1  GT:GQ:DP:MQ:PS:PQ   1|0:355:35:60:78714703:100
chr14   78715477    .   A   C   940.41  .   AC=2;AN=2;DP=34;MQ=59;NS=1  GT:GQ:DP:MQ:PS:PQ   1|1:106:34:59:78714703:100

Are you sure you're using the correct reference build (i.e. the one used for alignment)? Try running my command with your reference and see if you get any variants.

husamia commented 2 years ago

When I run it with my reference I get the same results that you get. Here is the debug log.

octopus_debug.log

[2021-09-16 15:21:33] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:33] <INFO> octopus v0.7.4 (develop 2ec5790e)
[2021-09-16 15:21:33] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-09-16 15:21:33] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:34] <WARN> Uncompressed VCF output requested - this may result in large output files
[2021-09-16 15:21:36] <INFO> Done initialising calling components in 2s
[2021-09-16 15:21:36] <INFO> Detected 1 sample: "XBRG"
[2021-09-16 15:21:36] <INFO> Invoked calling model: individual
[2021-09-16 15:21:36] <INFO> Processing 2,000bp with a single thread (40 cores detected)
[2021-09-16 15:21:36] <INFO> Writing filtered calls to "/mnt/g/XBRG/test.vcf"
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO>      current      |                   |     time      |     estimated
[2021-09-16 15:21:36] <INFO>      position     |     completed     |     taken     |     ttc
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO>  chr14:78715477            73.8%             65ms                 -
[2021-09-16 15:21:36] <INFO>               -             100%             66ms                 -
[2021-09-16 15:21:36] <INFO> Starting Call Set Refinement (CSR) filtering
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO>      current      |                   |     time      |     estimated
[2021-09-16 15:21:36] <INFO>      position     |     completed     |     taken     |     ttc
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO>  chr14:78714703            35.1%             41ms                 -
[2021-09-16 15:21:36] <INFO>  chr14:78714883            44.1%             42ms                 -
[2021-09-16 15:21:36] <INFO>  chr14:78715097            54.9%             42ms                 -
[2021-09-16 15:21:36] <INFO>  chr14:78715477            73.8%             42ms                 -
[2021-09-16 15:21:36] <INFO>               -             100%             42ms                 -
[2021-09-16 15:21:36] <INFO> Finished calling 2,000bp, total runtime 353ms
[2021-09-16 15:21:36] <INFO> Calls have been written to "/mnt/g/XBRG/test.vcf"
[2021-09-16 15:21:36] <INFO> Removed 2 temporary files
[2021-09-16 15:21:37] <INFO> ------------------------------------------------------------------------

could it be a bug with the trio?

dancooke commented 2 years ago

What about if you switch out the subsetted cram for the full one in this command?

husamia commented 2 years ago

It's erroring with the full one. My reference is the one that was used to align the reads. Does it have a problem with the extra contigs?


octopus -R Homo_sapiens_assembly38.fa -I XBRG.cram -T chr14:78,714,000-78,716,000 -o test.vcf --debug

[2021-09-16 18:54:36] <WARN> Uncompressed VCF output requested - this may result in large output files
[2021-09-16 18:54:36] <EROR> A user error has occurred:
[2021-09-16 18:54:36] <EROR> 
[2021-09-16 18:54:36] <EROR>     The region you specified '4' refers to a contig not in the reference
[2021-09-16 18:54:36] <EROR>     Homo_sapiens_assembly38.
[2021-09-16 18:54:36] <EROR> 
[2021-09-16 18:54:36] <EROR> To help resolve this error ensure the region is in the format
[2021-09-16 18:54:36] <EROR> contig[:begin][-][end], where contig refers to a contig in the reference
[2021-09-16 18:54:36] <EROR> Homo_sapiens_assembly38.
[2021-09-16 18:54:36] <INFO> ------------------------------------------------------------------------
dancooke commented 2 years ago

This is very odd - now the input region isn't getting parsed correctly, but this doesn't depend on the input reads so I'm at a bit of a loss. Would it be possible for you to provide the full CRAM?

husamia commented 2 years ago

I can't release the full CRAM.

dancooke commented 2 years ago

Sorry for the delayed reply. Could you try converting XBRG.cram into a BAM and rerun your previous command on that?

husamia commented 2 years ago

It worked with the converted bam. Does this mean I should convert the parents and do the de novo and it should work because it was issue with the bam reader?

dancooke commented 2 years ago

It does appear the issue lies with the CRAM files so that may be a way to move forward. Octopus calls htslib for BAM/CRAM I/O so it should work with either just fine - I wonder if your CRAMs are corrupted somehow.

Could you try one more thing - running on an empty CRAM, e.g.:

$ samtools view -CHT /mnt/g/Homo_sapiens_assembly38.fa XBRG.cram > XBRG.empty.cram
husamia commented 2 years ago

It does appear the issue lies with the CRAM files so that may be a way to move forward. Octopus calls htslib for BAM/CRAM I/O so it should work with either just fine - I wonder if your CRAMs are corrupted somehow.

Could you try one more thing - running on an empty CRAM, e.g.:

$ samtools view -CHT /mnt/g/Homo_sapiens_assembly38.fa XBRG.cram > XBRG.empty.cram

when I run it on an empty cram it doesn't throw an error. I get a properly formatted vcf with no calls.

husamia commented 2 years ago

I suppose the cram could be corrupt? I will run it with another set of data.