Closed ghost closed 4 years ago
Ok, I found some other cases, everytime in the vicinity of an abrupt mapping ending
Is DeepVariant catching what are reference genome misassemblies?
@aderzelle DeepVariant performs local realignment by default (can be disabled with --norealign_reads
for the make_examples
step). Realignment may cause this region to look a bit different than what you see in the original BAM, explaining the candidates that are generated.
By default, the realigned reads do not get written out, but you can use the flags below to output the realigned reads to a file called realigned_reads.bam
. These flags can be added to the make_examples
step so that you can inspect the realigned data.
--emit_realigned_reads
- enables writing out of realigned reads
--realigner_diagnostics=/path/to/dir
- path to which the reads will be written
Thank you,
How exactly does it perform the realignment? How does it differ from, let's say, bwa mem?
DeepVariant uses Smith-Waterman for realignment, specifically this library: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library.
Hello! But what behavior is correct for the two examples in this issue? It seems that this "local realignment" breaks the correct one and produces incorrect calls!
We can help take a look if you send us the bam and realigned bam, and even the make_examples output, if you can share any of these. Otherwise, you may want to experiment with turning realignment off by using the --norealign_reads
flag.
Hello, isn't it a case of ... "well, here we need wetlab Sanger sequencing"? I certainly can share, but I need to find a time slot to relaunch DeepVariant with the production of evidence bam. I will be on it because I am curious. @MariaNattestad where can I send you the bam?
Hi @aderzelle If the question is why a variant is getting called when you see no sign of it in the original bam, then that can only be answered by inspecting the realigned bam, not by additional sequencing. It seems likely given the split reads that you are seeing the signal of some kind of misassembly or structural variant (if the genomes producing the bam and the reference are not the same). If seeing the realigned bam does not answer your question, then you can send the files to marianattestad@google.com and I can take a look. I would need to see the original bam, the realigned bam, the reference genome, and the position of the variant you are asking about.
@aderzelle DeepVariant performs local realignment by default (can be disabled with
--norealign_reads
for themake_examples
step). Realignment may cause this region to look a bit different than what you see in the original BAM, explaining the candidates that are generated.By default, the realigned reads do not get written out, but you can use the flags below to output the realigned reads to a file called
realigned_reads.bam
. These flags can be added to themake_examples
step so that you can inspect the realigned data.
--emit_realigned_reads
- enables writing out of realigned reads--realigner_diagnostics=/path/to/dir
- path to which the reads will be written
Hello, how are we supposed to pass those arguments with the one-step wrapper script?
I tried
docker run -v "${INPUT_DIR}":"/input" -v "${OUTPUT_DIR}:/output" google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant --model_type=WGS --ref=/input/shasta_final.fa --reads="/input/${SAMPLE}.sorted.bam" --regions="/input/ARCcestor.bed" --output_vcf=/output/${SAMPLE}.vcf.gz --output_gvcf=/output/${SAMPLE}.g.vcf.gz --num_shards="${N_SHARDS}" --customized_model="/input/mosquito_model/model.ckpt-97700" --make_examples_extra_args=emit_realigned_reads,realigner_diagnostics=/media/urbe/MyADrive/12-03-2020_DeepVariant_NDPD/bam_diagnostics
but runs into
Traceback (most recent call last):
File "/opt/deepvariant/bin/run_deepvariant.py", line 317, in <module>
app.run(main)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/opt/deepvariant/bin/run_deepvariant.py", line 304, in main
commands = create_all_commands()
File "/opt/deepvariant/bin/run_deepvariant.py", line 276, in create_all_commands
sample_name=FLAGS.sample_name))
File "/opt/deepvariant/bin/run_deepvariant.py", line 182, in make_examples_command
kwargs.update(_extra_args_to_dict(extra_args))
File "/opt/deepvariant/bin/run_deepvariant.py", line 135, in _extra_args_to_dict
(flag_name, flag_value) = extra_arg.split('=')
ValueError: need more than 1 value to unpack
EDIT: ah ok, the flag must explicitly be set to "=true" ^^'
Hello, (me again sorry), I am eyeballing bam files for evaluation of candidates. Mostly it's fine (I open a lot of issues but I want to stress that most reported variants seem correct). But here is a very, very strange case to my eyes.
Here is the gvcf line
That looks all right to me, seems like a "solid" candidate. But now, let's look at the bam
the relevant position is the first C starting from the right. As you see, there is not a single T base there. The site seems perfectly homozygous for C:C.
However, a bit on the left, you can see that many of the mapped reads abruptly end at the same position with a T. It's not a variant but something a bit strange seems to happen in that region.
So far it's the only such case I have. Do you have any idea of what's going on?
Here are the lines from the gVCF before the site. The abrupt T position is at 17434056, so in a block with no candidate. But just right after, for a few bases, DeepVariant threw a few calls that are completely not supported by the mapping.
any idea of what's going on there? It's a bit annoying in the sense that I don't know how I could have caught that without eyeballing the alignment. In my experiment I plan to eyeball all my candidates anyway (because there are less than 100 of them).