medizininformatik-initiative / feasibility-backend

Backend of the feasibility-gui.
Apache License 2.0
2 stars 5 forks source link

Ontology 2.0 #180

Closed juliangruendner closed 11 months ago

juliangruendner commented 1 year ago

TODO - to be specified by @geloro94.

Includes all the changes neccessary to support the fdpg+ ontology 2.0

geloro94 commented 1 year ago

To have a more flexible and scalable version of the ontology future versions will no longer be based on a JSON file. Instead, a db is employed that normalizes the repeating information. Further, the latest version of the Ontology utilizes a "context" which is a TermCode Object (system, code, version, display) to make the combination of TermCode and context unique for each criterion. This is necessary as TermCodes on their own are not sufficient for the unique specification of a concept. For example, the code for heart attack can be a diagnosis or a cause of death.

Ontology_2 0 drawio(3)

The image pictures the db schema.

In https://github.com/medizininformatik-initiative/fhir-ontology-generator/blob/ontology-v2/database/DataBaseWriter.py you can find the SQL statements that create said schema. Verify the correctness and check if adjustments are required.

The goal is to generate a DB export that can be loaded by the backend. And in the future provides all information for the front end.

Once the schema and the DB export are provided adjust the API between the front and back end to send the UI-Tree instead of the current UI-Profile Json. Adjust the API https://feasibility.forschen-fuer-gesundheit.de/api/v2/terminology/ui_profile? to also use the context as a parameter: context_system, context_code, context_version.

michael-82 commented 1 year ago

My suggestion would be to change the db to the following:

CREATE TABLE IF NOT EXISTS termcode
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    UNIQUE (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS UI_PROFILE
(
    id      TEXT PRIMARY KEY,
    content JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXT
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    UNIQUE (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_UI_PROFILE
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    profile_id  TEXT    NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT profile_id_FK FOREIGN KEY (profile_id) REFERENCES UI_PROFILE (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS MAPPING
(
    id      SERIAL PRIMARY KEY,
    name    TEXT NOT NULL,
    type    TEXT NOT NULL,
    UNIQUE (name, type),
    content JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_MAPPING
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    mapping_id  INTEGER NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT MAPPING_ID_FK FOREIGN KEY (mapping_id)
        REFERENCES MAPPING (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS VALUE_SET
(
    canonical_url TEXT NOT NULL,
    system        TEXT NOT NULL,
    code          TEXT NOT NULL,
    version       TEXT,
    PRIMARY KEY (canonical_url, system, code, version)
);

CREATE INDEX idx_mapping_name_mapping ON MAPPING (name);
CREATE INDEX idx_mapping_name_contextualized ON CONTEXTUALIZED_CONCEPT_TO_MAPPING (mapping_id);

which results in onto_db

This mostly adds some surrogate keys. I personally would also prefer adding one to the value_set table, but since this is not related to any other tables, I'm fine with leaving it as is. I also changed some names in a way I found more suitable.

michael-82 commented 1 year ago

After further discussion, the db was changed to:

CREATE TABLE IF NOT EXISTS termcode
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    unique nulls not DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS UI_PROFILE
(
    id      TEXT PRIMARY KEY,
    content JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXT
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    unique nulls not DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_UI_PROFILE
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    profile_id  TEXT    NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT profile_id_FK FOREIGN KEY (profile_id) REFERENCES UI_PROFILE (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS MAPPING
(
    id      SERIAL PRIMARY KEY,
    name    TEXT NOT NULL,
    type    TEXT NOT NULL,
    UNIQUE (name, type),
    content JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_MAPPING
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    mapping_id  INTEGER NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT MAPPING_ID_FK FOREIGN KEY (mapping_id)
        REFERENCES MAPPING (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS VALUE_SET
(
    id            serial primary key,
    canonical_url TEXT NOT NULL,
    system        TEXT NOT NULL,
    code          TEXT NOT NULL,
    version       TEXT,
    unique nulls not distinct (canonical_url, system, code, version)
);

CREATE INDEX idx_mapping_name_mapping ON MAPPING (name);
CREATE INDEX idx_mapping_name_contextualized ON CONTEXTUALIZED_CONCEPT_TO_MAPPING (mapping_id);

onto_db2

michael-82 commented 12 months ago

Another slight change to the DDL script - adding indexes to the FK columns referencing termcode.id in CONTEXTUALIZED_CONCEPT_TO_UI_PROFILE and CONTEXTUALIZED_CONCEPT_TO_MAPPING. Otherwise deleting the termcode table takes a very long time.

CREATE TABLE IF NOT EXISTS termcode
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    unique nulls not DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS UI_PROFILE
(
    id         TEXT PRIMARY KEY,
    ui_profile JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXT
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    unique nulls not DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_UI_PROFILE
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    profile_id  TEXT    NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT profile_id_FK FOREIGN KEY (profile_id) REFERENCES UI_PROFILE (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS MAPPING
(
    id      SERIAL PRIMARY KEY,
    name    TEXT NOT NULL,
    type    TEXT NOT NULL,
    UNIQUE (name, type),
    content JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_MAPPING
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    mapping_id  INTEGER NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT MAPPING_ID_FK FOREIGN KEY (mapping_id)
        REFERENCES MAPPING (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS VALUE_SET
(
    id            serial primary key,
    canonical_url TEXT NOT NULL,
    system        TEXT NOT NULL,
    code          TEXT NOT NULL,
    version       TEXT,
    unique nulls not distinct (canonical_url, system, code, version)
);

CREATE INDEX idx_mapping_name_mapping ON MAPPING (name);
CREATE INDEX idx_mapping_name_contextualized ON CONTEXTUALIZED_CONCEPT_TO_MAPPING (mapping_id);
CREATE INDEX idx_concept_to_mapping_termcode_fk ON CONTEXTUALIZED_CONCEPT_TO_MAPPING (termcode_id);
CREATE INDEX idx_concept_to_uiprofile_termcode_fk ON CONTEXTUALIZED_CONCEPT_TO_UI_PROFILE (termcode_id);
michael-82 commented 11 months ago

As there are ongoing discussion whether to change the value_set table, as a base for further discussion. (Please note that the name of the table "value_set" should probably change as well, which is not yet reflected in this post for a lack of a good name)

image

CREATE TABLE IF NOT EXISTS termcode
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    unique nulls not DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS UI_PROFILE
(
    id         TEXT PRIMARY KEY,
    ui_profile JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXT
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    unique nulls not DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_UI_PROFILE
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    profile_id  TEXT    NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT profile_id_FK FOREIGN KEY (profile_id) REFERENCES UI_PROFILE (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS MAPPING
(
    id      SERIAL PRIMARY KEY,
    name    TEXT NOT NULL,
    type    TEXT NOT NULL,
    UNIQUE (name, type),
    content JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS CONTEXTUALIZED_CONCEPT_TO_MAPPING
(
    context_id  INTEGER NOT NULL,
    termcode_id INTEGER NOT NULL,
    mapping_id  INTEGER NOT NULL,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT MAPPING_ID_FK FOREIGN KEY (mapping_id)
        REFERENCES MAPPING (id) ON DELETE CASCADE
);

CREATE TABLE IF NOT EXISTS canonical_url
(
    id  SERIAL PRIMARY KEY,
    url TEXT UNIQUE NOT NULL
);

CREATE TABLE IF NOT EXISTS VALUE_SET
(
    id               SERIAL PRIMARY KEY,
    canonical_url_id INTEGER NOT NULL,
    termcode_id      INTEGER NOT NULL,
    constraint CANONICAL_URL_ID_FK foreign key (canonical_url_id)
        references canonical_url (id) on delete cascade,
    constraint TERMCODE_ID_FK foreign key (termcode_id)
        references termcode (id) on
            delete cascade
);

CREATE INDEX idx_mapping_name_mapping ON MAPPING (name);
CREATE INDEX idx_mapping_name_contextualized ON CONTEXTUALIZED_CONCEPT_TO_MAPPING (mapping_id);
CREATE INDEX idx_concept_to_mapping_termcode_fk ON CONTEXTUALIZED_CONCEPT_TO_MAPPING (termcode_id);
CREATE INDEX idx_concept_to_uiprofile_termcode_fk ON CONTEXTUALIZED_CONCEPT_TO_UI_PROFILE (termcode_id);
create index idx_value_set_termcode_fk on VALUE_SET (termcode_id);
michael-82 commented 11 months ago

After another review, the following schema is agreed upon.

CREATE TABLE IF NOT EXISTS termcode
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    unique nulls not DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS ui_profile
(
    id         SERIAL PRIMARY KEY,
    name       TEXT NOT NULL,
    ui_profile JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS context
(
    id      SERIAL PRIMARY KEY,
    system  TEXT NOT NULL,
    code    TEXT NOT NULL,
    version TEXT,
    UNIQUE NULLS NOT DISTINCT (system, code, version),
    display TEXT NOT NULL
);

CREATE TABLE IF NOT EXISTS mapping
(
    id      SERIAL PRIMARY KEY,
    name    TEXT NOT NULL,
    type    TEXT NOT NULL,
    UNIQUE (name, type),
    content JSON NOT NULL
);

CREATE TABLE IF NOT EXISTS contextualized_termcode
(
    context_termcode_hash TEXT PRIMARY KEY,
    context_id            INTEGER NOT NULL,
    termcode_id           INTEGER NOT NULL,
    mapping_id            INTEGER,
    ui_profile_id         INTEGER,
    CONSTRAINT CONTEXT_ID_FK FOREIGN KEY (context_id)
        REFERENCES CONTEXT (id) ON DELETE CASCADE,
    CONSTRAINT CONCEPT_ID_FK FOREIGN KEY (termcode_id)
        REFERENCES termcode (id) ON DELETE CASCADE,
    CONSTRAINT mapping_id_fk FOREIGN KEY (mapping_id)
        REFERENCES mapping (id),
    CONSTRAINT ui_profile_id_fk FOREIGN KEY (ui_profile_id)
        REFERENCES ui_profile (id),
    UNIQUE (context_id, termcode_id)
);

CREATE TABLE IF NOT EXISTS contextualized_value_set
(
    id  SERIAL PRIMARY KEY,
    url TEXT UNIQUE NOT NULL
);

CREATE TABLE IF NOT EXISTS contextualized_termcode_to_contextualized_value_set
(
    context_termcode_hash       TEXT    NOT NULL,
    contextualized_value_set_id INTEGER NOT NULL,
    UNIQUE (context_termcode_hash, contextualized_value_set_id),
    CONSTRAINT CONTEXTUALIZED_VALUE_SET_ID_FK FOREIGN KEY (contextualized_value_set_id)
        REFERENCES CONTEXTUALIZED_VALUE_SET (id) ON DELETE CASCADE ,
    CONSTRAINT CONTEXTUALIZED_TERMCODE_ID_FK FOREIGN KEY (context_termcode_hash)
        REFERENCES CONTEXTUALIZED_TERMCODE (context_termcode_hash) ON DELETE CASCADE
);

CREATE INDEX idx_mapping_name_mapping ON mapping (name);
CREATE INDEX idx_contextualized_termcode_termcode_fk ON contextualized_termcode (termcode_id);

COMMENT ON COLUMN contextualized_termcode.context_termcode_hash IS 'This value is hashed using UUID-V3 (MD5) from the concatenated string of (context.system, context.code, context.version, termcode.system, termcode.code, termcode.version), omitting null values for version and without any delimiters. The mandatory namespace UUID will be predefined and shared along all components. The conversion between chars and bytes for hashing is using UTF-8 encoding.';

Resulting in:

onto_db3

Please be aware that if you previously started the backend with the associated branch, leading to the execution of the versioned sql script, you MUST delete the previously added tables manually AND remove the entry from the flyway table (`delete from flyway_schema_history where version = '7')

alexanderkiel commented 11 months ago

This value is hashed using UUID-V3 (MD5) from the concatenated string of (context.system, context.code, context.version, termcode.system, termcode.code, termcode.version), omitting null values for version and without any delimiters.

Here the namespace UUID and the fact that we use UTF-8 encoding to convert between chars and bytes is missing.

michael-82 commented 11 months ago

Thanks for noticing - I updated it here and in the sql script