morph-kgc / morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings
https://morph-kgc.readthedocs.io
Apache License 2.0
192 stars 35 forks source link

Two logical sources with rr:sqlQuery for the same table #245

Closed 00ade closed 6 months ago

00ade commented 6 months ago

Hi! I have a table "outputevents" with 4 columns: row_id, subject_id, hadm_id, charttime.

row_id subject_id hadm_id charttime
1 2 - 11/02/2024
3 4 5 12/02/2024

The hadm_id column can be null. What I want to do is the following:

This means that if hadm_id is null, I want to consider the ID for the patient. But if the admission code is not null, this code is sufficient.

I've tried to achieve this by creating two logicalSources with two sqlQuery statements.

@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://w3id.org/rml/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix mimic: <http://mimic-translation-project.org/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

@base <http://mimic-translation-project.org/> .

<T1> a rml:AssertedTriplesMap;
    rml:logicalSource [
        rr:tableName "adelina.outputevents3";
        rr:sqlQuery """
        SELECT *
        FROM adelina.outputevents3
        WHERE hadm_id is null
        """
    ];

    rml:subjectMap [
        rr:template "http://mimic-translation-project.org/res/outputEvent{row_id}"
    ];

    rr:predicateObjectMap [
        rr:predicate mimic:hasPatient;
        rml:objectMap [
            rr:template "http://mimic-translation-project.org/res/patient{subject_id}"
        ]
    ];

    rr:predicateObjectMap [
        rr:predicate mimic:chartTime;
        rml:objectMap [
            rml:reference "charttime";
            rr:datatype xsd:dateTime
        ]
    ].

<T2> a rml:AssertedTriplesMap;
    rml:logicalSource [
        rr:tableName "adelina.outputevents3";
        rr:sqlQuery """
        SELECT *
        FROM adelina.outputevents3
        WHERE hadm_id is not null
        """
    ];

    rml:subjectMap [
        rr:template "http://mimic-translation-project.org/res/outputEvent{row_id}"
    ];

    rr:predicateObjectMap [
        rr:predicate mimic:refAdmission;
        rml:objectMap [
            rr:template "http://mimic-translation-project.org/res/admission{hadm_id}"
        ]
    ];

    rr:predicateObjectMap [
        rr:predicate mimic:chartTime;
        rml:objectMap [
            rml:reference "charttime";
            rr:datatype xsd:dateTime
        ]
    ]. 

As for the T1 triples, everything is fine: -> Output triple when hadm_id is null:

@prefix ns1: <http://mimic-translation-project.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://mimic-translation-project.org/res/outputEvent---> ns1:chartTime "2107-10-03T05:30:00"^^xsd:dateTime ;
    ns1:hasPatient <http://mimic-translation-project.org/res/patient----> .

The problem is for the T2 part. Indeed, in the triples is added also the subject_id, even if it is not specified in the mappings. -> Output triple when hadm_id is not null:

<http://mimic-translation-project.org/res/outputEvent---> ns1:chartTime "2122-07-01T04:00:00"^^xsd:dateTime ;
    ns1:hasPatient <http://mimic-translation-project.org/res/patient----> ;
    ns1:refAdmission <http://mimic-translation-project.org/res/admission----> .

Could you explain be why the predicate hasPatient and its object is added also in the second part?

Additionally, is it possible to change the prefix name automatically created in the output? This expression is added to the result: " @prefix ns1: http://mimic-translation-project.org/ ", but I would like to change ns1.

Than you for your time.

arenas-guerrero-julian commented 6 months ago

A logical source has an associated query or table, not both. In your case you want to use a query so just remove the table (both for T1 and T2):

rml:logicalSource [
        rr:sqlQuery """
        SELECT *
        FROM adelina.outputevents3
        WHERE hadm_id is not null
        """
    ];

Regarding you latter question on the prefix, I assume you are using the materialize method of Morph-KGC, hence you obtain an RDFlib graph. You have to check the RDFlib API to see how to change the prefix. Other option is to use materialize_oxigraph and check what you get. Also, if you are OK with N-Triples format (that has no prefixes) you may execute Morph-KGC via command line.

00ade commented 6 months ago

Thank you very much for your help! I deleted the "tablename" and it works.

Yes, I'm using materialize method of Morph-KGC. I will read the RDFlib API. Thank you!