griffithlab / civic-docs

Source code for the civicdb.org docs
5 stars 8 forks source link

Add conventions for creating a variant name with punctuation and capitalization #51

Open kkrysiak opened 4 years ago

kkrysiak commented 4 years ago

1) For umbrella variant fusion names where the fusion partner is unknown/unspecified we prefer the style "GENE Fusions" or "GENE FUSIONS" (e.g., ALK Fusions). I see all combinations of "GENE Fusion", "GENE Fusions", "GENE FUSION", "FUSION", etc. Example in docs is "ALK Fusions" but I think "ALK FUSIONS" is preferred? There isn't really a leader in current curation: https://civicdb.org/search/variants/6ee62717-61f4-465a-b783-b3f2fc0ade52

2) For specific fusion names we prefer the style "GENEA-GENEB" and not "GENEA-GENEB Fusion" or "GENEA-GENEB FUSION". Again, we have a mix

Similar confusion for rsIDs (rs12345 vs RS12345)

Is Loss-of-function or Loss of function or title case?

kkrysiak commented 4 years ago

Delins c. p. g.

malachig commented 4 years ago

We should also have comprehensive examples for simpler variants in the table: https://civic.readthedocs.io/en/latest/model/variants/name.html

Right now we do not have a variant name example for:

Also I wonder if we should revisit the idea of using rs ids as variant names?

malachig commented 4 years ago

Also we don't have an example of how we name a frameshift variant. Currently I see a mix of the following styles:

We could adopt the simplest and most easily readable style? F137fs. The HGVS expressions for the variant will show the correct longer style.

However, there could be multiple distinct frameshifts that start at that amino acid position. This would perhaps argue for the most precise style: P146Gfs*14. Its also the most consistent with HGVS.

malachig commented 4 years ago

Show how/when to use "and" and "or" in variant names.

Note that this is a temporary measure until genotypes / variant sets are supported in CIViC

kkrysiak commented 4 years ago

Clarify to only use an rsID if it occurs outside of a coding / transcribed region. Using c. or p. is preferred.

Avoid more ambiguous names like 3'UTR polymorphism (https://civicdb.org/events/genes/2596/summary/variants/433/summary/evidence/1031/summary#evidence)

kkrysiak commented 4 years ago
malachig commented 4 years ago

Splicing alterations we could just shorten the name to include the c. notation only. Rather than saying "Splicing alteration".

e.g. "Splicing alteration (c.340+1G>A)" would become "c.340+1G>A".

Also make it clear that we do not want any use of the "IVS" style of splicing variant names.

malachig commented 4 years ago

State a convention or guidance on when to use "LOSS" vs "DELETION" vs "UNDEREXPRESSION".

kkrysiak commented 4 years ago

Variant names for internal duplications when we have gone beyond FLT3-ITD. https://civicdb.org/events/genes/29/summary/variants/67/summary#variant

kkrysiak commented 4 years ago

Exon 14 skipping would be another example to handle.

kkrysiak commented 4 years ago

We should make sure we also provide guidance for variant type consistency, particularly for splice altering variants rather that don't make a change to the DNA.

lsheta commented 3 years ago

Working table with variant name conventions: