google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.23k stars 725 forks source link

Variant detected in VCF but not in bam #691

Closed amy-houseman closed 1 year ago

amy-houseman commented 1 year ago

Hello,

This may be a naive question but I have some variants which were detected as PASS by DeepVariant - however I cannot see this variants within the bam files?

Any ideas? I likely will have to filter these variants out anyway but wanted to double check if there could be reasons causing this.

For reference: HG38 Chr15:41570158 T>C The depth is 14,9 from the VCF.

This is the region in the bam: Image 02-08-2023 at 16 38

I know this is probably an IGV question but thought I would just ask incase!

Thanks!! Amy

pichuan commented 1 year ago

Hi @amy-houseman

For short reads, DeepVariant does a local realignment to improve our Indel accuracy. Therefore, the reads the DeepVariant is seeing might have different positions from the original BAM.

Please see this section of the FAQ for more explanation: https://github.com/google/deepvariant/blob/r1.5/docs/FAQ.md#why-does-deepvariant-not-call-a-specific-variant-in-my-data

pichuan commented 1 year ago

Hopefully this answers your question @amy-houseman . If not, feel free to reopen!

pgrosu commented 1 year ago

Hi Amy,

As Pi-Chuan mentioned is good. I only see a total of 14 reads supporting at position 41,570,158 of chromosome 15 -- hopefully its the same BAM file as the one used in DeepVariant.

Now regarding IGV here are a few things:

$1)$ Under View > Preferences > Alignments set the following three:

Uncheck "Downsample reads" like this:

image

Check "Show mismatch bases":

image

Check "Label indels > threshold", and have a value of 0 (and uncheck Hide indels):

image

$2)$ Under View > Preferences > Third Gen use these settings:

Here again perform the following:

image

Let me know if it helps.

Thanks, Paul

amy-houseman commented 1 year ago

Hi Both, thank you so much for your replies! I really appreciate it. I adjusted the settings with your additions Paul but got the same result as I put above. Definitely the same bam and vcf sample, I just double checked. Hopefully just a realignment thing like Pi-Chuan mentioned. We may end up getting them validated just to make sure we're not throwing any potentials away! Hard choice!

pgrosu commented 1 year ago

Hmm, having no doubts is essential. Would you be comfortable with posting a few complete rows from the VCF file, which would include this variant and a few above and below. DeepVariant is base-specific, so it would tell us something that might be captured by a few rows.

Also could you post the whole script and command you used to run DeepVariant? You can mask out anything that is sensitive.

Just trying to be thorough we're not missing anything.

Thanks, `p

amy-houseman commented 1 year ago

Hi Paul,

From that sample:

chr15 41568345 . C A 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:24:14:12,2:0.142857:0,23,42
chr15 41568363 . G GA 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:35:12:10,2:0.166667:0,35,47
chr15 41570158 . T C 6.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:7:23:14,9:0.391304:5,0,46
chr15 41570603 . G T 10.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:11:31:8,23:0.741935:10,0,39
chr15 41573327 . G T 11.1 PASS . GT:GQ:DP:AD:VAF:PL 0/1:11:21:12,9:0.428571:10,0,43
chr15 41669918 . C T 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:42:13:11,2:0.153846:0,45,43
chr15 41699117 . A T 49.2 PASS . GT:GQ:DP:AD:VAF:PL 0/1:43:96:44,52:0.541667:49,0,44

and from another sample with the same variant described also as passed:

chr15 41537032 . A G 38.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:38:76:42,34:0.447368:38,0,46
chr15 41561123 . T G 0.1 RefCall . GT:GQ:DP:AD:VAF:PL ./.:15:19:11,3:0.157895:0,15,37
chr15 41565105 . T C 40.6 PASS . GT:GQ:DP:AD:VAF:PL 0/1:40:74:56,18:0.243243:40,0,55
chr15 41567477 . G A 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:21:15:13,2:0.133333:0,20,39
chr15 41569024 . T C 8.9 PASS . GT:GQ:DP:AD:VAF:PL 0/1:9:12:9,3:0.25:8,0,41
chr15 41570158 . T C 6.7 PASS . GT:GQ:DP:AD:VAF:PL 0/1:7:26:19,7:0.269231:5,0,47
chr15 41570238 . A G 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:35:14:9,3:0.214286:0,37,38
chr15 41570240 . G T 0 RefCall . GT:GQ:DP:AD:VAF:PL 0/0:32:13:10,2:0.153846:0,35,34

This was WES, and part of the script used was:

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "singularity run -B /usr/lib/locale/:/usr/lib/locale/ containers/deepvariant_1.3.0.sif /opt/deepvariant/bin/run_deepvariant --model_type=WES \
--ref=$HG38_REFERENCE \
--reads=$PICARDMARKDUPLICATES_SORTEDBAM \
--regions=$BED_REGIONS \
--output_vcf=$OUTPUT_VCF \
--output_gvcf=$OUTPUT_GVCF \
--intermediate_results_dir=$INTERMEDIATE_RESULTS"
pichuan commented 1 year ago

Hi @amy-houseman , I realize that I should also point you to this part of the FAQ: https://github.com/google/deepvariant/blob/r1.5/docs/FAQ.md#what-is-the-realigner-and-how-does-it-work which talks about how you can output the realigned BAM to examine.

Specifically, if you run make_examples with --norealign_reads (or --realign_reads=false), you can turn off the realigner. We don't recommend this by default, but can be useful if you just want to double check.

And, as mentioned in the FAQ section above, if you run in the region near by and add --make_examples_extra_args="emit_realigned_reads=true,realigner_diagnostics=/output/realigned_reads", you'll be able to get a small BAM file with the realigned reads so that you can take a closer look to confirm whether it's because of the realigner or not.

amy-houseman commented 1 year ago

Amazing, thank you, I will try that tomorrow!

pgrosu commented 1 year ago

Hi Amy,

Not sure if you had a chance to check, but did the realigned reads in the BAM files generated by DeepVariant confirm the observed VCF variants?

Thanks, Paul

amy-houseman commented 1 year ago

Hi Paul,

I got an error when running on my HPC so I'm just waiting for the IT people to get back to me, I assume its a simple fix since I have managed to get the deepvariant container to work a few months back:

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_GB.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_GB.UTF-8"
    are supported and installed on your system.

I'll let you know when they reply! Amy

pgrosu commented 1 year ago

Hi Amy,

That's just a warning that you can ignore/fix it via binding /usr/lib/locale, at the beginning of your Singularity command like this:

singularity run -B /usr/lib/locale/:/usr/lib/locale/

By the way, you ran into it last year in a previous issue, that didn't affect your computation in the following post:

https://github.com/google/deepvariant/issues/542

Notice the computation continued ignoring your locale settings.

Is that the only warning you see, or is the computation stuck? Also, just a minor thing, you're running DeepVariant 1.3.0, and now it is at version 1.5.0, and with the next release the plan is before the end of this year. It probably would not affect your results significantly. You can check on the changes between releases at the following page,

Thanks, Paul

amy-houseman commented 1 year ago

Oo likely my fault! If I am adding the particular region around that variant such as "chr15:41,132,484-42,007,831"

Do I still have to provide a WES bed file? I have this as my current script and last time it seemed like the bed file was the problem..

#!/bin/bash --login
#SBATCH -J AmyHouseman_deepvariant
#SBATCH -o %x.stdout.%J.%N
#SBATCH -e %x.stderr.%J.%N
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1
#SBATCH -p htc
#SBATCH --account=scw1581
#SBATCH --mail-type=ALL          # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=HousemanA@cardiff.ac.uk     # Where to send mail
#SBATCH --array=1-2
#SBATCH --mem-per-cpu=68GB
#SBATCH --qos=maxjobs500

module purge
module load parallel
module load singularity

EXOME_IDs_FILE=/scratch/c.c21087028/checking_variant_deepvariant/exome_ID_file
HG38_REFERENCE=/scratch/c.c21087028/coverage_graph_and_clincnv_files/ClinCNV/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
PICARDMARKDUPLICATES_SORTEDBAM=/scratch/c.c21087028/checking_variant_deepvariant/{}PE_markedduplicates_GCA_000001405.15_GRCh38_no_alt_analysis_set.bam
REGIONS="chr15:41,132,484-42,007,831"
OUTPUT_VCF=/scratch/c.c21087028/checking_variant_deepvariant/{}PE_output_GCA_000001405.15_GRCh38_no_alt_analysis_set_checkingvariantregion.vcf.gz
OUTPUT_GVCF=/scratch/c.c21087028/checking_variant_deepvariant/{}PE_output_GCA_000001405.15_GRCh38_no_alt_analysis_set_checkvariantregion.g.vcf.gz
INTERMEDIATE_RESULTS=/scratch/c.c21087028/checking_variant_deepvariant/{}PE_output_intermediate

# Set bash error trapping to exit on first error.
set -eu

cd /scratch/c.c21087028/

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "singularity run -B /usr/lib/locale/:/usr/lib/locale/ containers/deepvariant_1.3.0.sif /opt/deepvariant/bin/run_deepvariant --model_type=WES \
--ref=$HG38_REFERENCE \
--reads=$PICARDMARKDUPLICATES_SORTEDBAM \
--regions=$REGIONS \
--make_examples_extra_args="emit_realigned_reads=true,realigner_diagnostics=/output/realigned_reads" \
--output_vcf=$OUTPUT_VCF \
--output_gvcf=$OUTPUT_GVCF \
--intermediate_results_dir=$INTERMEDIATE_RESULTS"

with the error:

***** Intermediate results will be written to /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_output_intermediate in docker. ****

***** Running the command:*****
time seq 0 0 | parallel -q --halt 2 --line-buffer /opt/deepvariant/bin/make_examples --mode calling --ref "/scratch/c.c21087028/coverage_graph_and_clincnv_files/ClinCNV/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna" --reads "/scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_markedduplicates_GCA_000001405.15_GRCh38_no_alt_analysis_set.bam" --examples "/scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_output_intermediate/make_examples.tfrecord@1.gz" --emit_realigned_reads --gvcf "/scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_output_intermediate/gvcf.tfrecord@1.gz" --realigner_diagnostics "/output/realigned_reads" --regions "chr15:41,132,484-42,007,831" --task {}

perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_GB.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LANG = "en_GB.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
I0807 12:32:36.932506 47023237326656 genomics_reader.py:222] Reading /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_markedduplicates_GCA_000001405.15_GRCh38_no_alt_analysis_set.bam with NativeSamReader
W0807 12:32:36.932703 47023237326656 make_examples_core.py:276] No non-empty sample name found in the input reads. DeepVariant will use default as the sample name. You can also provide a sample name with the --sample_name argument.
I0807 12:32:36.943572 47023237326656 make_examples_core.py:239] Preparing inputs
I0807 12:32:36.953401 47023237326656 genomics_reader.py:222] Reading /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_markedduplicates_GCA_000001405.15_GRCh38_no_alt_analysis_set.bam with NativeSamReader
I0807 12:32:36.964995 47023237326656 make_examples_core.py:239] Common contigs are ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY', 'chrM']
I0807 12:32:36.975915 47023237326656 make_examples_core.py:239] Starting from v0.9.0, --use_ref_for_cram is default to true. If you are using CRAM input, note that we will decode CRAM using the reference you passed in with --ref
2023-08-07 12:32:36.976266: I third_party/nucleus/io/sam_reader.cc:736] Setting HTS_OPT_BLOCK_SIZE to 134217728
I0807 12:32:37.440702 47023237326656 genomics_reader.py:222] Reading /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_markedduplicates_GCA_000001405.15_GRCh38_no_alt_analysis_set.bam with NativeSamReader
I0807 12:32:37.546969 47023237326656 genomics_reader.py:222] Reading /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_markedduplicates_GCA_000001405.15_GRCh38_no_alt_analysis_set.bam with NativeSamReader
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 180, in <module>
    app.run(main)
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/absl_py/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/absl_py/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/make_examples.py", line 170, in main
    make_examples_core.make_examples_runner(options)
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 1630, in make_examples_runner
    region_processor.initialize()
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 870, in initialize
    self._initialize()
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/make_examples_core.py", line 853, in _initialize
    self.realigner = realigner.Realigner(
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/realigner/realigner.py", line 559, in __init__
    self.diagnostic_logger = DiagnosticLogger(self.config.diagnostics)
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/realigner/realigner.py", line 327, in __init__
    self._csv_file = open(self._root_join(self.metrics_filename), 'w')
  File "/tmp/Bazel.runfiles_mpmtqvy4/runfiles/com_google_deepvariant/deepvariant/realigner/realigner.py", line 346, in _root_join
    tf.io.gfile.makedirs(subdir)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/lib/io/file_io.py", line 514, in recursive_create_dir_v2
    _pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.PermissionDeniedError: /output; Read-only file system
parallel: This job failed:
/opt/deepvariant/bin/make_examples --mode calling --ref /scratch/c.c21087028/coverage_graph_and_clincnv_files/ClinCNV/hg38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --reads /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_markedduplicates_GCA_000001405.15_GRCh38_no_alt_analysis_set.bam --examples /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_output_intermediate/make_examples.tfrecord@1.gz --emit_realigned_reads --gvcf /scratch/c.c21087028/checking_variant_deepvariant/E036-H-013_TAAGGCGA-ACTGCATA_L007_PE_output_intermediate/gvcf.tfrecord@1.gz --realigner_diagnostics /output/realigned_reads --regions chr15:41,132,484-42,007,831 --task 0

real    0m3.220s
user    0m1.999s
sys 0m0.838s
pgrosu commented 1 year ago

This has a simple fix. Basically you need to replace /output/realigned_reads with a location you have write-access to. For example, you can do the following commands:

Assuming you have write-access to this folder /scratch/c.c21087028/checking_variant_deepvariant, you can create a sub-directory called realigned_reads like this:

mkdir /scratch/c.c21087028/checking_variant_deepvariant/realigned_reads

Then in your Singularity script you have two options to pick from:

Option 1 - update only the following line:

--make_examples_extra_args="emit_realigned_reads=true,realigner_diagnostics=/scratch/c.c21087028/checking_variant_deepvariant/realigned_reads" \

All other lines for Option 1 remain the same.

Option 2 - update the following lines (I used -B to make /output accessible inside the container):

sed -n "${SLURM_ARRAY_TASK_ID}p" $EXOME_IDs_FILE | parallel -j 1 "singularity run -B /usr/lib/locale/:/usr/lib/locale/ -B /scratch/c.c21087028/checking_variant_deepvariant/:/output/ containers/deepvariant_1.3.0.sif /opt/deepvariant/bin/run_deepvariant --model_type=WES \

The rest of the lines can stay the same for Option 2.

Regarding a BED file, you don't need one as they are the same thing as regions -- which you already provide:

https://en.wikipedia.org/wiki/BED_(file_format)

Let me know how this runs, and if your run into any other issues.

Hope it helps, Paul

amy-houseman commented 1 year ago

Hi Paul, that worked! I have the output files now, which would be best to look at first to see what's happening? This is the realignment bam for one of samples with the original bam below it.

Image 08-08-2023 at 09 29

Thanks! Amy

pgrosu commented 1 year ago

Hi Amy,

Great! IGV is fine. Now just look in your VCF file for the region in question, and try to confirm with what you see in IGV using the realigned BAMs. For example, you had the following variant the last time:

chr15 41570158 . T C 6.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:7:23:14,9:0.391304:5,0,46

Do you still see that variant in your current VCF file, and do the number of variations match (or exceed because of base quality) in IGV for that position? Given the counts for the alternate allele (C) the last time, it would need to have been at least 9, but I just see 1 in your picture. Does the VCF this time also reflect 1 (or less), or does it still say 9 for the alternate allele (C) as the last time?

Basically the numbers have to confirm each other between the realigned BAM and current VCF if realignment has been performed for that region, or the original BAM and current VCF if no realignment was performed for that region. Ideally as a first pass check multiple regions to be sure, but start with our original variant (T/C at chr15:41570158) as a known starting point. If they confirm each other then that's awesome, otherwise we would need to investigate the issue deeper.

Thanks, Paul

pichuan commented 1 year ago

Looking at @amy-houseman 's latest IGV in https://github.com/google/deepvariant/issues/691#issuecomment-1669163372 , it seems like the realigned BAM would see 1 read containing the alt allele C. If this is the case, I don't expect it to even trigger our candidate generation logic. (From this post, it didn't seem like @amy-houseman has specified different vsc_ thresholds)

One possibility is that the realignment BAM was generated with different region compared with when the variant calling was run, like @pgrosu mentioned above.

amy-houseman commented 1 year ago

Hi both,

I reran the entire bed file both with and without the --make_examples_extra_args="emit_realigned_reads=true,realigner_diagnostics=/scratch/c.c21087028/checking_variant_deepvariant/realigned_reads" \ because I think the regions I used before it didn't like so just decided to do the entire file.

Both of the entries for that variant are the same:

Without realign:

chr15 41570158 . T C 6.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:7:23:14,9:0.391304:5,0,46

With realign:

chr15 41570158 . T C 6.5 PASS . GT:GQ:DP:AD:VAF:PL 0/1:7:23:14,9:0.391304:5,0,46

When looking at the realigned bam's it actually has the realignment directory "chr15:41570025-41570158" so using that bam with the original, I get - with the C's actually being identified this time:

Image 08-08-2023 at 21 42

Sorry for the confusion!

pgrosu commented 1 year ago

Hi Amy,

That makes perfect sense now, and the realigned BAM file confirms the counts within expected values produced by the VCF. Basically it will realign unless you add the --norealign_reads (or --realign_reads=false), as Pi-Chuan mentioned earlier.

So when you used the regular BAM file it realigned the reads -- because the above parameter is has a default value of True -- and when you used the realigned BAM file, it didn't need to align much or at all. The parameters used are defined as follows:

There is always more that can be done if you really want to be sure, but this is fairly satisfactory. By the way, VCF files can also be loaded in IGV as well so that the comparison can be done directly.

Hope it helps, ~p

amy-houseman commented 1 year ago

Thank you both so much, you have been more than helpful!

amy-houseman commented 11 months ago

Hi again, sorry to be a pain! I thought I would comment under this one again since the topic is the same. I have a variant that is called in my VCF file:

image

However I cannot find it in the bams nor prior to realignment or after realignment (I tried with region specific and then using the entire bed region). Image 21-11-2023 at 12 30

Top is the VCF file. Middle is the realignment bam file. Bottom is the bam file before deepvariant.

I am not sure where to go from here, thanks! Amy

pichuan commented 11 months ago

Hi @amy-houseman Thank you for following up. Is it possible for you to share a small BAM that can reproduce this IGV that you're looking at? If so, I can take a look and see if I can reproduce the issue and try to understand what's going on.

amy-houseman commented 11 months ago

Hi! Thank you for your reply! Yes I can - could I send via email or some other way? Thanks!! Amy

pichuan commented 11 months ago

@amy-houseman Sure. Please don't send any sensitive data, and try to restrict to just a small region that we can reproduce. You can email to pichuan@google.com.

pichuan commented 7 months ago

Expanding on my answer in https://github.com/google/deepvariant/issues/691#issuecomment-1662968609

I'll post my commands that I shared with @amy-houseman in email, in case it's useful for other users in the future.

I ran this on a small BAM file and BED file that @amy-houseman shared. I'll call them "YOUR.bam" and "YOUR.bed" below.

BIN_VERSION="1.6.0"
docker run \
  -v ${PWD}:/data \
  -v ${PWD}/reference:/reference \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type=WES \
  --ref "/reference/GRCh38_no_alt_analysis_set.fasta" \
  --reads "/data/YOUR.bam" \
  --output_vcf=/data/output.vcf.gz \
  --num_shards=1 \
  --regions "/data/YOUR.bed" \
  --make_examples_extra_args="emit_realigned_reads=true,realigner_diagnostics=/data/realigned_reads"

Specifically, this step is the one that we need to look at:

time docker run \
  -v ${PWD}:/data \
  -v ${PWD}/reference:/reference \
  google/deepvariant:1.6.0 \
  /opt/deepvariant/bin/make_examples \
  --mode calling \
  --ref "/reference/GRCh38_no_alt_analysis_set.fasta" \
  --reads "/data/YOUR.bam" \
  --examples "/data/output.tfrecord.gz" \
  --channels "insert_size" \
  --regions "/data/YOUR.bed"

In this case, I have a few positions that I was interested in looking at.

To look at the positions, the output directories are in:

$ ls -lh realigned_reads/chr15:41562546-41562720
total 8.0K
-rw-r--r-- 1 root root 5.7K Mar 10 06:14 realigned_reads.bam
$ ls -lh realigned_reads/chr15:41570025-41570158
total 8.0K
-rw-r--r-- 1 root root 5.3K Mar 10 06:14 realigned_reads.bam

Sort and index them:

RANGE=41562546-41562720
samtools sort realigned_reads/chr15:${RANGE}/realigned_reads.bam > ${RANGE}.sorted.bam
samtools index ${RANGE}.sorted.bam
RANGE=41570025-41570158
samtools sort realigned_reads/chr15:${RANGE}/realigned_reads.bam > ${RANGE}.sorted.bam
samtools index ${RANGE}.sorted.bam

Output:

$ ls -lh *sorted.bam
-rw-rw-r-- 1 pichuan docker 7.2K Mar 10 06:36 41562546-41562720.sorted.bam
-rw-rw-r-- 1 pichuan docker 6.8K Mar 10 06:36 41570025-41570158.sorted.bam

Then, from here, I looked at these realigned BAMs and was able to see why the variants were proposed.