x-atlas-consortia / ubkg-neo4j

A container implementation to serve the Unified Biomedical Knowledge Graph in Neo4j
MIT License
1 stars 0 forks source link

Enforce naming rules for relationship names #48

Closed AlanSimmons closed 3 months ago

AlanSimmons commented 9 months ago

Statement of problem

In v5, it is now possible to create relationship indexes. The file indexes_constraints.cypher (currently in a development branch) executes a script that sets relationship indexes on all indexes in a UBKG instance.

Some of the relationships in the UBKG have names that violate neo4j naming rules. Unless Cypher statements escape these characters with backticks, they will fail.

Examples of noncompliant relationship names that we have encountered include:

  1. hyphens
  2. names that start with numbers
  3. names that include parentheses

In general, only alphabetic characters and the underscore should be used in relationship names.

Per the neo4j naming rules, the recommended format for relationships is a set of words in uppercase, separated with underscores.

Sources of error

Invalid relationship names are found in imports from both the UMLS and custom imports from data providers.

Solution options

  1. Ignore relationships with invalid names at the time of index creation.
  2. Reformat relationships from the UMLS, either at time of export from Neptune or during pre-ingestion processing.
  3. Reformat relationships from custom imports at time of ingestion.
AlanSimmons commented 8 months ago

Per 11 Jan DD meeting, cast relationship labels as lowercase.