medizininformatik-initiative / fhir-ontology-generator

3 stars 3 forks source link

Add translations for code systems #105

Open juliangruendner opened 3 weeks ago

juliangruendner commented 3 weeks ago

The ontology generated with the ontology generator is still missing the translations for the different languages. These translations, which can be downloaded from the the mii terminology server should be added to the elastic search files generated by the generator when the files are generated.

The structure should be changed from the current structure being an array or not existing display to a display attribute with exactly three translations - original, en-US, de-DE: "display": { "original": "Geriatrie", "en-US": "Geriatrix", "de-DE": "Geriatrie" }

This should be changed for all files and so for both index file inputs (onto_es__codeableconcept and onto_es_ontology)

Note that the files for the respective incides look different.

For the codeable concept the translations should be added to each object as follows:

**Codeable concept example changes** IS: ```json { "termcode": { "code": "0200", "display": "Geriatrie", "system": "http://fhir.de/CodeSystem/dkgev/Fachabteilungsschluessel", "version": 2099 }, "value_sets": [ "http://fhir.de/ValueSet/dkgev/Fachabteilungsschluessel" ] } ``` SHOULD: ```json { "termcode": { "code": "0200", "display": "Geriatrie", "system": "http://fhir.de/CodeSystem/dkgev/Fachabteilungsschluessel", "version": 2099 }, "value_sets": [ "http://fhir.de/ValueSet/dkgev/Fachabteilungsschluessel" ], "display": { "original": "Geriatrie", "en-US": "Geriatrix", "de-DE": "Geriatrie" } } ```

For the ontology the translations should be added to each object as follows:

Here the "name" attribute is removed and the display attribute with the translations added instead.

**ontology example changes** IS: ```json { "name": "1,1-Dimethoxy-(9Z)octadecene (DMA 18:1)/Oleate (C18:1w9) [Mass Ratio] in Fibroblast", "availability": 0, "terminology": "http://loinc.org", "termcode": "74620-6", "selectable": true, "context": { "system": "fdpg.mii.cds", "code": "Laboruntersuchung", "display": "Laboruntersuchung", "version": "1.0.0" }, "termcodes": [ { "system": "http://loinc.org", "code": "74620-6", "display": "1,1-Dimethoxy-(9Z)octadecene (DMA 18:1)/Oleate (C18:1w9) [Mass Ratio] in Fibroblast", "version": "2.78" } ], "criteria_sets": [], "translations": [], "parents": [], "children": [], "related_terms": [], "kds_module": "Labor" } ``` SHOULD: ``` { "availability": 0, "terminology": "http://loinc.org", "termcode": "74620-6", "selectable": true, "context": { "system": "fdpg.mii.cds", "code": "Laboruntersuchung", "display": "Laboruntersuchung", "version": "1.0.0" }, "termcodes": [ { "system": "http://loinc.org", "code": "74620-6", "display": "1,1-Dimethoxy-(9Z)octadecene (DMA 18:1)/Oleate (C18:1w9) [Mass Ratio] in Fibroblast", "version": "2.78" } ], "criteria_sets": [], "display": { "original": "1,1-Dimethoxy-(9Z)octadecene (DMA 18:1)/Oleate (C18:1w9) [Mass Ratio] in Fibroblas", "en-US": "1,1-Dimethoxy-(9Z)octadecene (DMA 18:1)/Oleate (C18:1w9) [Mass Ratio] in Fibroblast", "de-DE": "1,1-Dimethoxy-(9Z)octadecene (DMA 18:1)/Oleate (C18:1w9) [Mass Verhältnis] in Fibroblas" }, "parents": [], "children": [], "related_terms": [], "kds_module": "Labor" } ```

Downloading the translations from the terminology server

The translations for the respective code from the codesystem can be downloaded from the terminology server as follows:

Where translation is part of the system

example call for sct:

https://onto-server-base-url/fhir/ValueSet/$expand?url=https%3A%2F%2Fwww.medizininformatik-initiative.de%2Ffhir%2Fcore%2Fmodul-diagnose%2FValueSet%2Fdiagnoses-sct&displayLanguage=de&force-system-version=http%3A%2F%2Fsnomed.info%2Fsct%7Chttp%3A%2F%2Fsnomed.info%2Fsct%2F11000274103&includeDesignations=true

example call for loinc:

https://onto-server-base-url/fhir/ValueSet/$expand?url=https%3A%2F%2Fwww.medizininformatik-initiative.de%2Ffhir%2Fext%2Fmodul-icu%2FValueSet%2FCode-Monitoring-und-Vitaldaten-LOINC&includeDesignations=true&designation=urn%3Aietf%3Abcp%3A47%7Cde-DE

where translation is added using a FDPG translation supplement:

Note that the supplement url has to be known in order to add the supplement - @jpwiedekopf suggested there will be a registry doc on the ontoserver. which can be called as follows:

https://onto-server-base-url/fhir/CodeSystem/fdpg-supplement-registry

@paulolaup TODO - check how to correctly load a supplement

https://onto-server-base-url/fhir/ValueSet/$expand?url=https://www.medizininformatik-initiative.de/fhir/core/modul-person/ValueSet/Vitalstatus&useSupplement=https://example.org/fhir/CodeSystem/KDS/Person/Vitalstatus/translations|1.0.0
juliangruendner commented 3 weeks ago

@paulolaup , @Frontman50 - make sure to consider that some code systems (like sct) need to have the version explicitedly set in order to expand the designations

e..g:

https://onto-server-base-url/fhir/ValueSet/$expand?url=https%3A%2F%2Fwww.medizininformatik-initiative.de%2Ffhir%2Fcore%2Fmodul-diagnose%2FValueSet%2Fdiagnoses-sct&displayLanguage=de&force-system-version=http%3A%2F%2Fsnomed.info%2Fsct%7Chttp%3A%2F%2Fsnomed.info%2Fsct%2F11000274103&includeDesignations=true

paulolaup commented 2 weeks ago

@paulolaup , @Frontman50 - make sure to consider that some code systems (like sct) need to have the version explicitedly set in order to expand the designations

e..g:

https://onto-server-base-url/fhir/ValueSet/$expand?url=https%3A%2F%2Fwww.medizininformatik-initiative.de%2Ffhir%2Fcore%2Fmodul-diagnose%2FValueSet%2Fdiagnoses-sct&displayLanguage=de&force-system-version=http%3A%2F%2Fsnomed.info%2Fsct%7Chttp%3A%2F%2Fsnomed.info%2Fsct%2F11000274103&includeDesignations=true

Yes, the question is, do we need to have some initial configuration in the ontology generator that is loaded by some designation resolver so that it can determine whether we need a version to resolve the value sets for some code system it encounters. Or does the FDPG designation supplement concept map provide this information for us.

paulolaup commented 2 weeks ago

Current implementation plan:

Implement separate TermcodeDesignationResolver class handling the designation resolution logic:

  1. Download and cache the supplement registry CodeSystem resource from the specified terminology server (system-url of this CodeSystem resource should ideally be provided by some externalized configuration - config file etc.).
  2. For every term code encountered during the elastic search file generation check if the corresponding value set expansion was already retrieved previously
    1. If yes, lookup the designations for the coding
    2. If no, call ValueSet-expand using the mapping in the supplement registry and cache the result and look up the designation afterward
  3. Generate the display entry

Open questions:

  1. Since both the supplement registry (as per the proposal) and the operation call refer to ValueSet resource likely have to resolve the correct ValueSet instance using the information provided during the generation process. @paulolaup
  2. See comment above @paulolaup
jpwiedekopf commented 2 weeks ago

@paulolaup , @Frontman50 - make sure to consider that some code systems (like sct) need to have the version explicitedly set in order to expand the designations e..g: https://onto-server-base-url/fhir/ValueSet/$expand?url=https%3A%2F%2Fwww.medizininformatik-initiative.de%2Ffhir%2Fcore%2Fmodul-diagnose%2FValueSet%2Fdiagnoses-sct&displayLanguage=de&force-system-version=http%3A%2F%2Fsnomed.info%2Fsct%7Chttp%3A%2F%2Fsnomed.info%2Fsct%2F11000274103&includeDesignations=true

Yes, the question is, do we need to have some initial configuration in the ontology generator that is loaded by some designation resolver so that it can determine whether we need a version to resolve the value sets for some code system it encounters. Or does the FDPG designation supplement concept map provide this information for us.

Correction: you do not explicitly have to specify the version of SNOMED CT. The full version of the current German Edition is: http://snomed.info/sct/11000274103/version/20240515. The example specified by @juliangruendner provides only the edition URI http://snomed.info/sct/11000274103, which works correctly by "just" using the most current version of the German Edition indexed by the server (which I'll keep current when new versions are releases so you can take advantage of new translations ASAP).

juliangruendner commented 1 week ago

@Frontman50 , @paulolaup

Addition to specification above:

The current implementation should be extended to populate the translations from two different sources.

  1. The initial map for code systems and their translations should be populated based on the value sets from code systems where translations exist.

For this a mapping file should contain for each code system information on how to resolve the specific language information.

The informaion should be configurable via a json:

{
    "code_system_translations":{
        "http://snomed.info/sct": {
            "parameters":[
                {
                    "name": "version",
                    "valueUri": "http://snomed.info/sct/11000274103"
                },
                {
                    "name": "property",
                    "valueString": "designation"
                },
                {
                    "name": "displayLanguage",
                    "valueUri": "de"
                }
            ]
        },
        "http://loinc.org": {
            "parameters":[
                {
                    "name": "property",
                    "valueString": "property"
                }
            ]
        }
    }
}

Based on this information all codes should be looked up

https://ontoserver.mii-termserv.de/fhir/CodeSystem/$lookup?system=http://loinc.org&code=4544-3&property=lang.de-DE

but in batches as follows:

https://documenter.getpostman.com/view/145584/SWTD6wPM#97fb3c6e-2241-46d4-b4d1-382f46a98933

note that this will have to be done for all entries in all ui_trees and all entries in all value sets

For each ui_tree and for each value_set check if the code system is in the json config above and if yes create lookup in batches if no: skip entry

  1. Add translations from supplements

similar to how it is currently implemented the translations for all supplements should be added to the resolver.

Additionally there should be an option to download new translation supplements (update_translation_supplements) which when enabled first looks up all the code systems in the fdpg supplement registry url: https://ontoserver-base-url/fhir/CodeSystem/fdpg-supplement-registry

and then downloads all the fdpg supplements and writes them to the local file system in a folder in the repository, which is set to gitignore

  1. Additional algorithm information

The algorithm should consider the following:

a. supplements are weaker translations than the ones directly part of a code system (code system translations) -> when adding supplement translations, they should only be added if no code system translation is available

paulolaup commented 1 week ago

@Frontman50 , @paulolaup

Addition to specification above:

The current implementation should be extended to populate the translations from two different sources.

1. The initial map for code systems and their translations should be populated based on the value sets from code systems where translations exist.

For this a mapping file should contain for each code system information on how to resolve the specific language information.

The informaion should be configurable via a json:

{
    "code_system_translations":{
        "http://snomed.info/sct": {
            "parameters":[
                {
                    "name": "version",
                    "valueUri": "http://snomed.info/sct/11000274103"
                },
                {
                    "name": "property",
                    "valueString": "designation"
                },
                {
                    "name": "displayLanguage",
                    "valueUri": "de"
                }
            ]
        },
        "http://loinc.org": {
            "parameters":[
                {
                    "name": "property",
                    "valueString": "property"
                }
            ]
        }
    }
}

Based on this information all codes should be looked up

https://ontoserver.mii-termserv.de/fhir/CodeSystem/$lookup?system=http://loinc.org&code=4544-3&property=lang.de-DE

but in batches as follows:

https://documenter.getpostman.com/view/145584/SWTD6wPM#97fb3c6e-2241-46d4-b4d1-382f46a98933

note that this will have to be done for all entries in all ui_trees and all entries in all value sets

For each ui_tree and for each value_set check if the code system is in the json config above and if yes create lookup in batches if no: skip entry

2. Add translations from supplements

similar to how it is currently implemented the translations for all supplements should be added to the resolver.

Additionally there should be an option to download new translation supplements (update_translation_supplements) which when enabled first looks up all the code systems in the fdpg supplement registry url: https://ontoserver-base-url/fhir/CodeSystem/fdpg-supplement-registry

and then downloads all the fdpg supplements and writes them to the local file system in a folder in the repository, which is set to gitignore

3. Additional algorithm information

The algorithm should consider the following:

a. supplements are weaker translations than the ones directly part of a code system (code system translations) -> when adding supplement translations, they should only be added if no code system translation is available

I'd propose to use a proper Parameters resource as the value for each code system URL.

Where should the config go? Should it be located in the example directory similar to the translations?