morph-kgc / morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings
https://morph-kgc.readthedocs.io
Apache License 2.0
169 stars 32 forks source link

Use of UDFs when fetching data from Oracle for building KG #239

Closed IshanDindorkar closed 2 months ago

IshanDindorkar commented 2 months ago

Hello Team,

Thank you for your work. I recently started using Morph-KGC library for one of the use cases focused on building Knowledge Graph from a relational database like Oracle. We are trying to explore if there is a possibility to use UDFs (Python based user-defined functions) to process incoming data from database before generating KG out of it. To do it, we followed steps mentioned in the official documentation and tried same mapping as shown in the doc. The only difference being the source of data used for example is a CSV file while in our case we are fetching data from an Oracle db. After spending some time, we found this test for UDF functionality and tried to replicate the mappings and config file. It works fine with CSV as an input but unfortunately, the UDF is not working as expected when the source of data is an Oracle table. Another thing which we noticed is that there are so many examples when db is being using in the mappings file to fetch data and build KG at this location in the repository but not even a single test shows use of UDF. Is it just a coincidence or there is some other reason for it? Could you please advise.

Thank you very much for your support. Appreciate it.

arenas-guerrero-julian commented 2 months ago

Hi @IshanDindorkar,

The UDFs should work with any data source. Do you get any specific error?

Regarding your sencond question, the reason for which there is no UDF example in that location is that it only contains the R2RML test cases which do not include UDFs.

IshanDindorkar commented 2 months ago

Hi @arenas-guerrero-julian,

Thank you very much for your prompt response. I am not getting an error as such. But the function does not get executed on the column of database table. For e.g. I am trying to fetch two columns in table A - col 1 & col 2. With the help of UDF I am trying to convert values of col 2 as lower case. When I save the KG, I do not see value of col2 converted to lower case. For this I am using a mapping file like shown below

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix ex: <http://example.com/> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .

@base <http://example.com/base/> .

<#TM1>
    rml:logicalSource [
        rml:query "SELECT col1, col2 FROM A WHERE ROWNUM < 10" ;
    ] ;

    rr:subjectMap [
        rr:template "http://example.com/{col1}" ;
        rr:class ex:col1 ;
    ] ;

     rr:predicateObjectMap [
        rr:predicate ex:col1;
        rr:objectMap [
            rr:column "col1" ;
        ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate ex:col2 ;
        rr:objectMap [
            rr:column "col2" ;
            rr:functionExecution <#Execution> ;
        ] ;
    ] .

<#Execution>
    rml:function ex:toLowerCase ;
    rml:input [
        rml:parameter grel:valueParam ;
        rml:inputValueMap [
            rml:reference "col2" ;
        ]
    ] .

The UDF looks like this

@udf(
    fun_id='http://example.com/toLowerCase',
    text='http://users.ugent.be/~bjdmeest/function/grel.ttl#valueParam')
def to_lower_case(text):
    return text.lower()

Could you please advise what I am missing here and help me in fixing the issue.

Thank you very much for your support. Appreciate it.

arenas-guerrero-julian commented 2 months ago

I think that the problem is that you are mixing R2RML and RML. For instance, you seem to be using RML but you employ rr:column, which is R2RML. The correct property is rml:reference. Similarly with RML-FNML, you are using rr:functionExecution but the correct property is rml:functionExecution.

Also, use the latest prefixes in the mapping:

@prefix rml: <http://w3id.org/rml/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@prefix morph-kgc: <https://github.com/morph-kgc/morph-kgc/function/built-in.ttl#> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix idlab-fn: <http://example.com/idlab/function/> .

Some additional advice:

IshanDindorkar commented 2 months ago

@arenas-guerrero-julian Thank you so much for your great response and pointing us in the right direction. Really appreciate it :) We fixed prefixes and replaced properties as you suggested and that helped us in resolving the issue. We will keep in mind to use YARRML as you advised. Regarding use of SELECT query for lowercasing, it completely makes sense. We actually were experimenting with UDFs and thought of starting with very basic operation like transforming text to lowercase as we have our data all in uppercase. But our ultimate goal is to implement much complex functionality in the Python-based UDFs for pre-processing our enterprise data before converting it into KG :) Thank you once again and have a great start for the week!