snakemake-workflows / dna-seq-varlociraptor

A Snakemake workflow for calling small and structural variants under any kind of scenario (tumor/normal, tumor/normal/relapse, germline, pedigree, populations) via the unified statistical model of Varlociraptor.
MIT License
83 stars 40 forks source link

Coding mutations in non-coding variants table #231

Open sci-kai opened 1 year ago

sci-kai commented 1 year ago

Hi, I have an issue with the table splitting in coding and non-coding variants. Within my dataset there are (germline) variants annotated in their consequence with "frameshift_variation" or even "coding_sequence_variant" that are sorted within the "non-coding" table. I may not understand the criteria which define coding and non-coding variants, it would be nice to have more documentation and clarity about these to understand this error. Here is an example variant that I found in the non-coding table:

chr: 11 position: 87622535 ref: C alt: \<DEL> gene: Rnf43 impact: HIGH consequence: frameshift_variant&feature_truncation

FelixMoelder commented 1 year ago

Hi @sci-kai! Thanks for reaching out. Variants for which a HGVSp value has been annotated are considered as coding variants. I assume that your variant does not have one? I am not sure if this is the one criteria but I will have a look at this after the weekend and come back to you.

Edit: I just checked and splitting variants in coding and non-coding variants is done by considering canonical transcripts and the presence/absence of a HGVSp value in each variant.

sci-kai commented 1 year ago

Thanks for the clarification! Also good to know that the transcript selection process is also performed at this step. These found variants do not have an HGVSp value, as those are mostly structural variants called with delly that are probably difficult to annotate with HGVSp values. That explains my confusion.

johanneskoester commented 1 year ago

I think we should rename the tables slightly, such that it becomes clear that noncoding can also contain variants where no information on the amino acid impact is available.

johanneskoester commented 1 year ago

suggestions welcome

sci-kai commented 12 months ago

Hi Johannes, I think it is a good idea to add "unknown" or something similar to "noncoding", i.e., rename the "noncoding" table into "noncoding/unknown". In general, molecular biologists filter for mutations based on the ensembl consequence terms to not miss such frameshift mutations as in my example, so maybe this should be considered for splitting the tables more in detail.