Closed husamia closed 2 years ago
Looks like there's at least one read supporting the variant allele in the mom, so maybe the variant was called germline. If you remove the --denovos-only
option then you check. Actually, I'd recommend against using the --denovos-only
option for this reason - I'm considering removing it in future versions.
well only one read in mom. The VAF in son is 54% which is pretty high.
I am interested in germline and somatic de novo.
Son
Mom
I will run without the --denovo-only option and find out.
Maybe this is an exceptional case. I am really interested in de novo only variants. Don't remove it. I will be testing it with other types of data. Maybe add some additional options?
Note that you still get de novo variants without the --denovos-only
- you get labelled germline and de novo variants (de novo variants are tagged with DENOVO
in the INFO
column).
There's another INFO
annotation PP
that is the probability of the called variant classification (germline or de novo), so you will probably find the your example variant is called germline but has a low PP
.
Can you provide the command to post filter the result VCF for the de novo only candidates? I will report the filtered result
bcftools view -i 'DENOVO=1' trio.vcf.gz > denovo.vcf
Looks like there's at least one read supporting the variant allele in the mom, so maybe the variant was called germline. If you remove the
--denovos-only
option then you check. Actually, I'd recommend against using the--denovos-only
option for this reason - I'm considering removing it in future versions.
I rerun it without the --denovos-only
option. The result is the same. 0 variants. Why is there no variants detected this time?
here is the log and the header
[2021-09-13 10:10:12] <INFO> ------------------------------------------------------------------------ [2021-09-13 10:10:12] <INFO> octopus v0.7.4 (develop 2ec5790e) [2021-09-13 10:10:12] <INFO> Copyright (c) 2015-2021 University of Oxford [2021-09-13 10:10:12] <INFO> ------------------------------------------------------------------------ [2021-09-14 09:53:55] <WARN> Could not estimate read size from data, resorting to default [2021-09-14 09:53:55] <INFO> Done initialising calling components in 23h 43m [2021-09-14 09:53:55] <INFO> Detected 3 samples: "XBRG" "XBRG-dad" "XBRG-mom" [2021-09-14 09:53:55] <INFO> Invoked calling model: trio [2021-09-14 09:53:55] <INFO> Processing 3,217,346,917bp with a single thread (40 cores detected) [2021-09-14 09:53:55] <INFO> Writing filtered calls to "XBRG_dnovo-2.octopus.vcf.gz" [2021-09-14 09:53:55] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 09:53:55] <INFO> current | | time | estimated [2021-09-14 09:53:55] <INFO> position | completed | taken | ttc [2021-09-14 09:53:55] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 16:21:58] <INFO> - 100% 6h 28m - [2021-09-14 16:21:58] <INFO> Starting Call Set Refinement (CSR) filtering [2021-09-14 16:21:58] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 16:21:58] <INFO> current | | time | estimated [2021-09-14 16:21:58] <INFO> position | completed | taken | ttc [2021-09-14 16:21:58] <INFO> ------------------------------------------------------------------------------------- [2021-09-14 16:21:58] <INFO> - 100% 21ms - [2021-09-14 16:21:58] <INFO> Finished calling 3,217,346,917bp, total runtime 6h 28m [2021-09-14 16:21:58] <INFO> Calls have been written to "XBRG_dnovo-2.octopus.vcf.gz" [2021-09-14 16:21:58] <INFO> Removed 2 temporary files [2021-09-14 16:21:58] <INFO> ------------------------------------------------------------------------
##octopus=<version=0.7.4_develop_2ec5790e,command="octopus -R /mnt/g/Homo_sapiens_assembly38.fa -I XBRG.cram XBRG-mom.cram XBRG-dad.cram -M XBRG-mom -F XBRG-dad --threads 1 -o XBRG_dnovo-2.octopus.vcf
.gz",options="--aggregate-annotations no --allow-cycles no --allow-marked-duplicates no --allow-octopus-duplicates no --allow-qc-fails no --allow-secondary-alignments no --allow-strand-biased-candidates no --allow-supplementary
-alignments no --assemble-all no --assembler-mask-base-quality 10 --backtrack-level NONE --bad-region-tolerance NORMAL --bamout-type MINI --caller population --clone-concentration 1 --clone-prior 0.01 --consider-unmapped-reads
no --contig-output-order REFERENCE_INDEX --contig-ploidies[0] Y=1 --contig-ploidies[1] chrY=1 --contig-ploidies[2] MT=1 --contig-ploidies[3] chrM=1 --denovo-filter-expression QUAL < 50 | PP < 40 | GQ < 20 | MQ < 30 | AD < 1 | A
F < 0.1 | AFB > 0.2 | SB > 0.95 | BQ < 20 | DP < 10 | ADP < 1 | DC > 1 | MF > 0.2 | FRF > 0.5 | MP < 30 | MQ0 > 2 --denovo-indel-prior 1e-09 --denovo-snv-prior 1.3e-08 --denovos-only no --disable-adapter-masking no --disable-as
sembly-candidate-generator no --disable-call-filtering no --disable-denovo-variant-discovery no --disable-downsampling no --disable-inactive-flank-scoring no --disable-overlap-masking no --disable-pileup-candidate-generator no
--disable-read-preprocessing no --disable-repeat-candidate-generator no --dont-model-mapping-quality no --dont-protect-reference-haplotype no --downsample-above 1000 --downsample-target 500 --dropout-concentration 5 --duplicate
-read-detection-policy RELAXED --extension-level MODERATE --fallback-kmer-gap 10 --fast no --filter-expression QUAL < 10 | MQ < 10 | MP < 10 | AD < 1 | AF < 0.01 | AFB > 0.25 | SB > 0.98 | BQ < 15 | DP < 1 | ADP < 1 --force-pil
eup-candidates no --good-base-quality 20 --haplotype-holdout-threshold 2500 --haplotype-overflow 200000 --ignore-unmapped-contigs no --indel-heterozygosity 0.0001 --keep-temporary-files no --keep-unfiltered-calls no --kmer-size
s[0] 10 --kmer-sizes[1] 15 --kmer-sizes[2] 20 --lagging-level MODERATE --mask-3prime-shifted-soft-clipped-heads no --mask-inverted-soft-clipping no --mask-soft-clipped-bases no --mask-soft-clipped-boundary-bases 2 --maternal-sa
mple XBRG-mom --max-assembly-region-overlap 200 --max-assembly-region-size 600 --max-bubbles 30 --max-clones 4 --max-copy-gain 0 --max-copy-loss 0 --max-decoy-supplementary-alignment-mapping-quality 5 --max-fallback-kmers 10 --
max-haplotypes 200 --max-holdout-depth 20 --max-indel-errors 16 --max-open-read-files 250 --max-read-length 10000 --max-reference-cache-memory 500MB --max-somatic-haplotypes 2 --max-unlocalized-supplementary-alignment-mapping-q
uality 5 --max-unplaced-supplementary-alignment-mapping-quality 5 --max-variant-size 2000 --max-vb-seeds 12 --min-bubble-score 2 --min-candidate-credible-vaf-probability 0.75 --min-clone-frequency 0.01 --min-credible-somatic-fr
equency 0.005 --min-denovo-posterior 3 --min-expected-somatic-frequency 0.01 --min-forest-quality 3 --min-good-bases 20 --min-kmer-prune 2 --min-mapping-quality 5 --min-phase-score 5 --min-pileup-base-quality 20 --min-protected
-haplotype-posterior 1e-10 --min-refcall-posterior 0 --min-somatic-posterior 0.5 --min-variant-posterior 0.1 --model-posterior SPECIAL --no-adapter-contaminated-reads no --no-reads-with-distant-segments no --no-reads-with-unmap
ped-segments no --normal-contamination-risk LOW --one-based-indexing no --organism-ploidy 2 --output XBRG_dnovo-2.octopus.vcf.gz --paternal-sample XBRG-dad --phasing-policy AUTO --phylogeny-concentration 20 --read-linkage PAIRE
D --reads[0] XBRG.cram --reads[1] XBRG-mom.cram --reads[2] XBRG-dad.cram --refcall-block-merge-quality 10 --refcall-filter-expression QUAL < 2 | GQ < 20 | MQ < 10 | DP < 10 | MF > 0.2 --reference Homo_sapiens_assembly38.fa --re
solve-symlinks no --sequence-error-model PCR-free.HiSeq-2500 --sites-only no --snp-heterozygosity 0.001 --snp-heterozygosity-stdev 0.01 --somatic-cnv-prior 1e-05 --somatic-credible-mass 0.9 --somatic-filter-expression QUAL < 2
| GQ < 20 | MQ < 30 | SMQ < 40 | SB > 0.9 | SD > 0.9 | BQ < 20 | DP < 3 | ADP < 1 | MF > 0.2 | NC > 1 | FRF > 0.5 | AD < 1 | AF < 0.0001 --somatic-indel-prior 1e-06 --somatic-snv-prior 0.0001 --somatics-only no --split-long-rea
ds no --target-read-buffer-memory 6GB --temp-directory-prefix octopus-temp --threads 1 --tumour-germline-concentration 1.5 --use-filtered-source-candidates no --use-germline-forest-for-somatic-normals no --use-independent-genot
ype-priors no --use-preprocessed-reads-for-filtering no --use-same-read-profile-for-all-samples no --use-uniform-genotype-priors no --use-wide-hmm-scores no --variant-discovery-mode ILLUMINA --very-fast no",date="2021-09-14 09:
53:55.188451700 EDT">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT XBRG XBRG-dad XBRG-mom
Can you run:
$ octopus -R /mnt/g/Homo_sapiens_assembly38.fa -I XBRG.cram XBRG-mom.cram XBRG-dad.cram -M XBRG-mom -F XBRG-dad -T chr14:78,714,000-78,716,000 -o XBRG_dnovo-2.octopus.vcf.gz --debug
and post the resulting octopus_debug.log
?
here is the full octopus_debug.log
this is illumina 30X WGS paired. There is warning message about the read size. Strange 0 in the reads
[2021-09-15 10:00:08] <INFO> ------------------------------------------------------------------------
[2021-09-15 10:00:08] <INFO> octopus v0.7.4 (develop 2ec5790e)
[2021-09-15 10:00:08] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-09-15 10:00:08] <INFO> ------------------------------------------------------------------------
[2021-09-15 10:00:08] <DEBG> Program options:
[2021-09-15 10:00:08] <DEBG> > aggregate-annotations=no
> allow-cycles=no
> allow-marked-duplicates=no
> allow-octopus-duplicates=no
> allow-qc-fails=no
> allow-secondary-alignments=no
> allow-strand-biased-candidates=no
> allow-supplementary-alignments=no
> assemble-all=no
> assembler-mask-base-quality=10
> backtrack-level=NONE
> bad-region-tolerance=NORMAL
> bamout-type=MINI
> caller=population
> clone-concentration=1
> clone-prior=0.01
> consider-unmapped-reads=no
> contig-output-order=REFERENCE_INDEX
> contig-ploidies[0]=Y=1
> contig-ploidies[1]=chrY=1
> contig-ploidies[2]=MT=1
> contig-ploidies[3]=chrM=1
~ debug=octopus_debug.log
> denovo-filter-expression=QUAL < 50 | PP < 40 | GQ < 20 | MQ < 30 | AD < 1 | AF < 0.1 | AFB > 0.2 | SB > 0.95 | BQ < 20 | DP < 10 | ADP < 1 | DC > 1 | MF > 0.2 | FRF > 0.5 | MP < 30 | MQ0 > 2
> denovo-indel-prior=1e-09
> denovo-snv-prior=1.3e-08
> denovos-only=no
> disable-adapter-masking=no
> disable-assembly-candidate-generator=no
> disable-call-filtering=no
> disable-denovo-variant-discovery=no
> disable-downsampling=no
> disable-inactive-flank-scoring=no
> disable-overlap-masking=no
> disable-pileup-candidate-generator=no
> disable-read-preprocessing=no
> disable-repeat-candidate-generator=no
> dont-model-mapping-quality=no
> dont-protect-reference-haplotype=no
> downsample-above=1000
> downsample-target=500
> dropout-concentration=5
> duplicate-read-detection-policy=RELAXED
> extension-level=MODERATE
> fallback-kmer-gap=10
> fast=no
> filter-expression=QUAL < 10 | MQ < 10 | MP < 10 | AD < 1 | AF < 0.01 | AFB > 0.25 | SB > 0.98 | BQ < 15 | DP < 1 | ADP < 1
> force-pileup-candidates=no
> good-base-quality=20
> haplotype-holdout-threshold=2500
> haplotype-overflow=200000
> ignore-unmapped-contigs=no
> indel-heterozygosity=0.0001
> keep-temporary-files=no
> keep-unfiltered-calls=no
> kmer-sizes[0]=10
> kmer-sizes[1]=15
> kmer-sizes[2]=20
> lagging-level=MODERATE
> mask-3prime-shifted-soft-clipped-heads=no
> mask-inverted-soft-clipping=no
> mask-soft-clipped-bases=no
> mask-soft-clipped-boundary-bases=2
~ maternal-sample=XBRG-mom
> max-assembly-region-overlap=200
> max-assembly-region-size=600
> max-bubbles=30
> max-clones=4
> max-copy-gain=0
> max-copy-loss=0
> max-decoy-supplementary-alignment-mapping-quality=5
> max-fallback-kmers=10
> max-haplotypes=200
> max-holdout-depth=20
> max-indel-errors=16
> max-open-read-files=250
> max-read-length=10000
> max-reference-cache-memory=500MB
> max-somatic-haplotypes=2
> max-unlocalized-supplementary-alignment-mapping-quality=5
> max-unplaced-supplementary-alignment-mapping-quality=5
> max-variant-size=2000
> max-vb-seeds=12
> min-bubble-score=2
> min-candidate-credible-vaf-probability=0.75
> min-clone-frequency=0.01
> min-credible-somatic-frequency=0.005
> min-denovo-posterior=3
> min-expected-somatic-frequency=0.01
> min-forest-quality=3
> min-good-bases=20
> min-kmer-prune=2
> min-mapping-quality=5
> min-phase-score=5
> min-pileup-base-quality=20
> min-protected-haplotype-posterior=1e-10
> min-refcall-posterior=0
> min-somatic-posterior=0.5
> min-variant-posterior=0.1
> model-posterior=SPECIAL
> no-adapter-contaminated-reads=no
> no-reads-with-distant-segments=no
> no-reads-with-unmapped-segments=no
> normal-contamination-risk=LOW
> one-based-indexing=no
> organism-ploidy=2
~ output=XBRG_dnovo-3.octopus.vcf.gz
~ paternal-sample=XBRG-dad
> phasing-policy=AUTO
> phylogeny-concentration=20
> read-linkage=PAIRED
~ reads[0]=XBRG.cram
~ reads[1]=XBRG-mom.cram
~ reads[2]=XBRG-dad.cram
> refcall-block-merge-quality=10
> refcall-filter-expression=QUAL < 2 | GQ < 20 | MQ < 10 | DP < 10 | MF > 0.2
~ reference=Homo_sapiens_assembly38.fa
~ regions[0]=chr14:78,714,000-78,716,000
> resolve-symlinks=no
> sequence-error-model=PCR-free.HiSeq-2500
> sites-only=no
> snp-heterozygosity=0.001
> snp-heterozygosity-stdev=0.01
> somatic-cnv-prior=1e-05
> somatic-credible-mass=0.9
> somatic-filter-expression=QUAL < 2 | GQ < 20 | MQ < 30 | SMQ < 40 | SB > 0.9 | SD > 0.9 | BQ < 20 | DP < 3 | ADP < 1 | MF > 0.2 | NC > 1 | FRF > 0.5 | AD < 1 | AF < 0.0001
> somatic-indel-prior=1e-06
> somatic-snv-prior=0.0001
> somatics-only=no
> split-long-reads=no
> target-read-buffer-memory=6GB
> temp-directory-prefix=octopus-temp
~ threads=2
> tumour-germline-concentration=1.5
> use-filtered-source-candidates=no
> use-germline-forest-for-somatic-normals=no
> use-independent-genotype-priors=no
> use-preprocessed-reads-for-filtering=no
> use-same-read-profile-for-all-samples=no
> use-uniform-genotype-priors=no
> use-wide-hmm-scores=no
> variant-discovery-mode=ILLUMINA
> very-fast=no
[2021-09-15 12:46:54] <WARN> Could not estimate read size from data, resorting to default
[2021-09-15 12:46:54] <DEBG> Estimated read memory footprint is 714B
[2021-09-15 12:46:54] <INFO> Done initialising calling components in 2h 46m
[2021-09-15 12:46:54] <INFO> Detected 3 samples: "XBRG" "XBRG-dad" "XBRG-mom"
[2021-09-15 12:46:54] <INFO> Invoked calling model: trio
[2021-09-15 12:46:54] <INFO> Processing 2,000bp with 2 threads (40 cores detected)
[2021-09-15 12:46:54] <INFO> Writing filtered calls to "/mnt/g/XBRG/XBRG_dnovo-3.octopus.vcf.gz"
[2021-09-15 12:46:54] <DEBG> Command line: octopus -R /mnt/g/Novogene-CDAR-WGS/raw_data/Homo_sapiens_assembly38.fa -I XBRG.cram XBRG-mom.cram XBRG-dad.cram -M XBRG-mom -F XBRG-dad -T chr14:78,714,000-78,716,000 --threads 2 -o XBRG_dnovo-3.octopus.vcf.gz --debug
[2021-09-15 12:46:54] <DEBG> All input regions:
Contig chr14
chr14:78714000-78716000
[2021-09-15 12:46:54] <WARN> Running in parallel mode can make debug log difficult to interpret
[2021-09-15 12:46:54] <DEBG> Making tasks for 1 contigs
[2021-09-15 12:46:54] <DEBG> Making tasks for contig chr14
[2021-09-15 12:46:55] <DEBG> Finished making tasks for contig chr14
[2021-09-15 12:46:55] <DEBG> Finished making tasks
[2021-09-15 12:46:55] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:46:55] <INFO> current | | time | estimated
[2021-09-15 12:46:55] <INFO> position | completed | taken | ttc
[2021-09-15 12:46:55] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:46:55] <DEBG> Finished calling contig chr14
[2021-09-15 12:46:55] <DEBG> Spawning task chr14:78714000-78716000
[2021-09-15 12:46:55] <DEBG> There are 1 idle futures
[2021-09-15 12:46:55] <DEBG> Finished making new tasks. Waiting for task writer to complete existing jobs
[2021-09-15 12:46:55] <DEBG> Waiting for 2 running tasks to finish
[2021-09-15 12:46:55] <DEBG> Task writer finished
[2021-09-15 12:49:06] <DEBG> Fetched 0 unfiltered reads from chr14:78713900-78716100
[2021-09-15 12:49:06] <DEBG> In sample XBRG-dad
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnlocalizedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnplacedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoDecoySupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedQcFail filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSupplementaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsLong filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotOctopusDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasSufficientGoodQualityBases filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsGoodMappingQuality filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsMapped filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasWellFormedCigar filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSecondaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsShort filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasValidBaseQualities filter
[2021-09-15 12:49:06] <DEBG> In sample XBRG-mom
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnlocalizedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnplacedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoDecoySupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedQcFail filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSupplementaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsLong filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotOctopusDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasSufficientGoodQualityBases filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsGoodMappingQuality filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsMapped filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasWellFormedCigar filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSecondaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsShort filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasValidBaseQualities filter
[2021-09-15 12:49:06] <DEBG> In sample XBRG
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnlocalizedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoUnplacedSupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the NoDecoySupplementaryAlignments filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedQcFail filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotMarkedDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSupplementaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsLong filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotOctopusDuplicate filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasSufficientGoodQualityBases filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsGoodMappingQuality filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsMapped filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasWellFormedCigar filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsNotSecondaryAlignment filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the IsShort filter
[2021-09-15 12:49:06] <DEBG> 0 reads failed the HasValidBaseQualities filter
[2021-09-15 12:49:06] <DEBG> There are 0 reads in chr14:78713900-78716100 after filtering
[2021-09-15 12:49:06] <DEBG> Downsampling removed 0 reads from chr14:78713900-78716100
[2021-09-15 12:49:06] <DEBG> Stopping early as no reads found in call region chr14:78714000-78716000
[2021-09-15 12:49:06] <DEBG> Writing completed task chr14:78714000-78716000 that finished in 2m 11s
[2021-09-15 12:49:06] <INFO> - 100% 2m 11s -
[2021-09-15 12:49:06] <DEBG> Merging 1 temporary VCF files
[2021-09-15 12:49:06] <INFO> Starting Call Set Refinement (CSR) filtering
[2021-09-15 12:49:06] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:49:06] <INFO> current | | time | estimated
[2021-09-15 12:49:06] <INFO> position | completed | taken | ttc
[2021-09-15 12:49:06] <INFO> ------------------------------------------------------------------------
[2021-09-15 12:49:06] <INFO> - 100% 8ms -
[2021-09-15 12:49:06] <INFO> Finished calling 2,000bp, total runtime 2m 12s
[2021-09-15 12:49:06] <INFO> Calls have been written to "/mnt/g/XBRG/XBRG_dnovo-3.octopus.vcf.gz"
[2021-09-15 12:49:07] <INFO> Removed 4 temporary files
[2021-09-15 12:49:07] <INFO> ------------------------------------------------------------------------
Yeah that's odd, any chance you can subset one of the CRAMs in that region with samtools
and send it to me?
I don't know the aligner that was used. There was no header info available.
Don't see any problems on my end:
$ octopus -R hs38DH.fa -I XBRG_subset.cram -T chr14:78,714,000-78,716,000 -o test.vcf
$ bcftools view -H test.vcf
chr14 78714703 . A T 889.15 . AC=2;AN=2;DP=34;MQ=60;NS=1 GT:GQ:DP:MQ:PS:PQ 1|1:216:34:60:78714703:100
chr14 78714883 . C T 431.88 . AC=1;AN=2;DP=48;MQ=60;NS=1 GT:GQ:DP:MQ:PS:PQ 0|1:431:48:60:78714703:100
chr14 78715097 . G A 404.49 . AC=1;AN=2;DP=35;MQ=60;NS=1 GT:GQ:DP:MQ:PS:PQ 1|0:355:35:60:78714703:100
chr14 78715477 . A C 940.41 . AC=2;AN=2;DP=34;MQ=59;NS=1 GT:GQ:DP:MQ:PS:PQ 1|1:106:34:59:78714703:100
Are you sure you're using the correct reference build (i.e. the one used for alignment)? Try running my command with your reference and see if you get any variants.
When I run it with my reference I get the same results that you get. Here is the debug log.
[2021-09-16 15:21:33] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:33] <INFO> octopus v0.7.4 (develop 2ec5790e)
[2021-09-16 15:21:33] <INFO> Copyright (c) 2015-2021 University of Oxford
[2021-09-16 15:21:33] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:34] <WARN> Uncompressed VCF output requested - this may result in large output files
[2021-09-16 15:21:36] <INFO> Done initialising calling components in 2s
[2021-09-16 15:21:36] <INFO> Detected 1 sample: "XBRG"
[2021-09-16 15:21:36] <INFO> Invoked calling model: individual
[2021-09-16 15:21:36] <INFO> Processing 2,000bp with a single thread (40 cores detected)
[2021-09-16 15:21:36] <INFO> Writing filtered calls to "/mnt/g/XBRG/test.vcf"
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO> current | | time | estimated
[2021-09-16 15:21:36] <INFO> position | completed | taken | ttc
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO> chr14:78715477 73.8% 65ms -
[2021-09-16 15:21:36] <INFO> - 100% 66ms -
[2021-09-16 15:21:36] <INFO> Starting Call Set Refinement (CSR) filtering
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO> current | | time | estimated
[2021-09-16 15:21:36] <INFO> position | completed | taken | ttc
[2021-09-16 15:21:36] <INFO> ------------------------------------------------------------------------
[2021-09-16 15:21:36] <INFO> chr14:78714703 35.1% 41ms -
[2021-09-16 15:21:36] <INFO> chr14:78714883 44.1% 42ms -
[2021-09-16 15:21:36] <INFO> chr14:78715097 54.9% 42ms -
[2021-09-16 15:21:36] <INFO> chr14:78715477 73.8% 42ms -
[2021-09-16 15:21:36] <INFO> - 100% 42ms -
[2021-09-16 15:21:36] <INFO> Finished calling 2,000bp, total runtime 353ms
[2021-09-16 15:21:36] <INFO> Calls have been written to "/mnt/g/XBRG/test.vcf"
[2021-09-16 15:21:36] <INFO> Removed 2 temporary files
[2021-09-16 15:21:37] <INFO> ------------------------------------------------------------------------
could it be a bug with the trio?
What about if you switch out the subsetted cram for the full one in this command?
It's erroring with the full one. My reference is the one that was used to align the reads. Does it have a problem with the extra contigs?
octopus -R Homo_sapiens_assembly38.fa -I XBRG.cram -T chr14:78,714,000-78,716,000 -o test.vcf --debug
[2021-09-16 18:54:36] <WARN> Uncompressed VCF output requested - this may result in large output files
[2021-09-16 18:54:36] <EROR> A user error has occurred:
[2021-09-16 18:54:36] <EROR>
[2021-09-16 18:54:36] <EROR> The region you specified '4' refers to a contig not in the reference
[2021-09-16 18:54:36] <EROR> Homo_sapiens_assembly38.
[2021-09-16 18:54:36] <EROR>
[2021-09-16 18:54:36] <EROR> To help resolve this error ensure the region is in the format
[2021-09-16 18:54:36] <EROR> contig[:begin][-][end], where contig refers to a contig in the reference
[2021-09-16 18:54:36] <EROR> Homo_sapiens_assembly38.
[2021-09-16 18:54:36] <INFO> ------------------------------------------------------------------------
This is very odd - now the input region isn't getting parsed correctly, but this doesn't depend on the input reads so I'm at a bit of a loss. Would it be possible for you to provide the full CRAM?
I can't release the full CRAM.
Sorry for the delayed reply. Could you try converting XBRG.cram
into a BAM and rerun your previous command on that?
It worked with the converted bam. Does this mean I should convert the parents and do the de novo and it should work because it was issue with the bam reader?
It does appear the issue lies with the CRAM files so that may be a way to move forward. Octopus calls htslib for BAM/CRAM I/O so it should work with either just fine - I wonder if your CRAMs are corrupted somehow.
Could you try one more thing - running on an empty CRAM, e.g.:
$ samtools view -CHT /mnt/g/Homo_sapiens_assembly38.fa XBRG.cram > XBRG.empty.cram
It does appear the issue lies with the CRAM files so that may be a way to move forward. Octopus calls htslib for BAM/CRAM I/O so it should work with either just fine - I wonder if your CRAMs are corrupted somehow.
Could you try one more thing - running on an empty CRAM, e.g.:
$ samtools view -CHT /mnt/g/Homo_sapiens_assembly38.fa XBRG.cram > XBRG.empty.cram
when I run it on an empty cram it doesn't throw an error. I get a properly formatted vcf with no calls.
I suppose the cram could be corrupt? I will run it with another set of data.
Describe the bug I detected a de novo variant in trio with a different caller. I wanted to see if there are more or if Octopus can pick it up. So I ran Octopus on the Trio data. It's 30X WGS illumina. The proband, mom and dad. Octopus detected 0 variants. It looks to me that this denovo is real but maybe Octopus missed it? see IGV screenshot.
Version
Command line to run octopus:
Additional context output
The pileup data looks like this variant is de novo