molgenis / data-transform-vkgl

GNU Lesser General Public License v3.0
0 stars 3 forks source link

Treat gene symbols with incorrect casing as invalid #52

Open dennishendriksen opened 3 years ago

dennishendriksen commented 3 years ago

Symbols contain only uppercase Latin letters and Arabic numerals, and punctuation is avoided, with an exception for hyphens in specific groups

source: https://www.genenames.org/about/guidelines/

Currently gene symbols with invalid casing (e.g. all lower-case) are considered as valid gene symbols. This results in an issue when determining consensus due to different gene-variant identifiers. Furthermore downstream users (e.g. VEP VKGL plugin or Alissa) have to take into account these casing issues when coupling data.

Gene-variants with gene symbols with invalid casing should be written to the error file with a message stating that the gene symbol is invalid because it contains lower-case characters.