google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.16k stars 713 forks source link

ouput variants from tool #775

Closed NIBIL401 closed 5 months ago

NIBIL401 commented 6 months ago

I have run the following command for RNA seq data and the output vcf size is very less and important variants are missing BIN_VERSION="1.5.0"

docker run   \
-v "$(pwd):$(pwd)"  \
-w $(pwd) \
google/deepvariant:"${BIN_VERSION}"  \
run_deepvariant \
--model_type=WES  \
--customized_model=model/model.ckpt \
--ref=reference/GRCh38_no_alt_analysis_set.fasta \
--reads=test_data/Aligned.sortedByCoord.out.bam \
--output_vcf=output/output.vcf.gz  \
--num_shards=30  \
--make_examples_extra_args="split_skip_reads=true,channels=''" \
--logging_dir=output/logs \
--intermediate_results_dir output/intermediate_results_dir

Please let me know if any error in the command i ran

AndrewCarroll commented 6 months ago

Hi @NIBIL401

Could I request a bit more information. When you say there are fewer variants, what are you comparing this to? I do note that you have BIN_VERSION=1.5.0, but our case study for RNAseq is BIN_VERSION=1.4.0. You may get better results using BIN_VERSION=1.4.0

NIBIL401 commented 6 months ago

Hi @AndrewCarroll When i ran variant calling on the same bam with other tool Im getting more variants than while running deepvariant. Also, some of the important variants are missed in the final output in the deepvariant. I tried 1.4.0 and i'm getting the same output. Let me know if there is any way to optimize the parameter or the code I'm trying is correct.

AndrewCarroll commented 6 months ago

Hi @NIBIL401

I don't see any other specific issues in your command. Without knowing more about the specific types of differences, it's difficult to give advice on what might be missing. One observation that we do have is that DeepVariant has learned not to call RNA editing events as variants. These are post-transcription changes to the RNA sequence. Those edits appear as A->G and T->C in sequencing data. To give more advice beyond this, I think I would need to know more about the sequencing (the most ideal would be to have some a BAM file or snippet with a variant call not being made that we can diagnose why).

Thank you, Andrew

NIBIL401 commented 6 months ago

Hi @AndrewCarroll , I used gatk to call variants from the RNA seq bam and I got around 13397 variants along with splice variants. But when I tried using the deep variant I only got 215 variants with important splice variants missing. Also i would like to know which type to bam is best for the use of deepvariant. i,e with chimeric read or without chimeric read option. Sorry i could not give you more information

AndrewCarroll commented 5 months ago

Hi @NIBIL401

I'm sorry, but without taking a look at the BAM file and the variants called or not called, it's quite difficult to say the reason why a variant would be missing. If you are able to share a snippet of it with an example, we can take a look.

For chimeric reads, do you mean secondary/supplementary read alignments?

pichuan commented 5 months ago

Hi @NIBIL401 , as @AndrewCarroll mentioned, it's hard for us to help determine the reason if we can't have a reproducible setup. If you can provide a similar reproducible setup with public data, that will be great!

Meanwhile, please read https://github.com/google/deepvariant/blob/r1.6/docs/FAQ.md#why-does-deepvariant-not-call-a-specific-variant-in-my-data to see if any of the topics there might apply.

For now, I'll close this issue, but please do feel free to reopen this bug with more information to help us debug!