Open sci-kai opened 1 year ago
Hi @sci-kai! Thanks for reaching out. Variants for which a HGVSp value has been annotated are considered as coding variants. I assume that your variant does not have one? I am not sure if this is the one criteria but I will have a look at this after the weekend and come back to you.
Edit: I just checked and splitting variants in coding and non-coding variants is done by considering canonical transcripts and the presence/absence of a HGVSp value in each variant.
Thanks for the clarification! Also good to know that the transcript selection process is also performed at this step. These found variants do not have an HGVSp value, as those are mostly structural variants called with delly that are probably difficult to annotate with HGVSp values. That explains my confusion.
I think we should rename the tables slightly, such that it becomes clear that noncoding can also contain variants where no information on the amino acid impact is available.
suggestions welcome
Hi Johannes, I think it is a good idea to add "unknown" or something similar to "noncoding", i.e., rename the "noncoding" table into "noncoding/unknown". In general, molecular biologists filter for mutations based on the ensembl consequence terms to not miss such frameshift mutations as in my example, so maybe this should be considered for splitting the tables more in detail.
Hi, I have an issue with the table splitting in coding and non-coding variants. Within my dataset there are (germline) variants annotated in their consequence with "frameshift_variation" or even "coding_sequence_variant" that are sorted within the "non-coding" table. I may not understand the criteria which define coding and non-coding variants, it would be nice to have more documentation and clarity about these to understand this error. Here is an example variant that I found in the non-coding table: