morph-kgc / morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings
https://morph-kgc.readthedocs.io
Apache License 2.0
191 stars 35 forks source link

condition block not working #275

Closed Crispae closed 3 months ago

Crispae commented 3 months ago

What Happens?

I am trying to add a object to a predict dc:identifier. To implement this I am adding conditional block, where the main id of the triple map KeyEvent will be checked with id paramter of these tags <key-event-reference id="81d03269-a0fd-46f0-995e-0667daf16156" aop-wiki-id="65536"/>, if it is equal than use aop-id value to add in the dc: identifier.

The data I am using is XML, for the key event triple map, I am using two source (single file with different iterator). One iterate over the key-event and other on key-event-reference.

The issue is no identifier is being assigned after succesful completion of kgc.

To Reproduce

mapping:

prefixes:
  rml: "http://semweb.mmlab.be/ns/rml#"
  rr: "http://www.w3.org/ns/r2rml#"
  ql: "http://semweb.mmlab.be/ns/ql#"
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  schema: "http://schema.org/"
  dbo: "http://dbpedia.org/ontology/"
  aop: "http://aop.org/ontology/"
  dc: "http://purl.org/dc/elements/1.1/"

sources:

 source5:
    access: "source_4.xml"
    referenceFormulation: "xpath"
    iterator: "/data/key-event"

  source6:
    access: "source_4.xml"
    referenceFormulation: "xpath"
    iterator: "/data/vendor-specific"

mappings:

    KeyEventMapping:
        sources:
          - source5 ## provide 
          - source6 ## provide key-event-reference
        subjects:
          - value: "http://www.example.com/keyEvent/$(@id)"
        predicateobjects:
          - [rdf:type, aop:keyEvent]
          - [dc:title, $(title)]
          - predicates: dc:identifier
            objects: $(key-event-reference/@aop-wiki-id)
            condition:
              function: equal
              parameters:
                    - [str1, $(@id)]
                    - [str2, $(@id)]

XML

<?xml version="1.0" encoding="UTF-8"?>
<data>

<key-event id="81d03269-a0fd-46f0-995e-0667daf16156">
    <title>Non-coding RNA expression profile alteration</title>
    <short-name>Non-coding RNA expression,alteration</short-name>
  </key-event>

  <key-event id="9fe19e2d-88c2-4787-a32d-651d003d5bce">
    <title>Chronic obstructive pulmonary disease</title>
    <short-name>Chronic obstructive pulmonary disease</short-name>
  </key-event>

<vendor-specific id="393b0e04-47bc-4006-996e-eb8b57949ef6" name="AopWiki" version="2024-07-01 10:01:07 +0000">
    <key-event-reference id="81d03269-a0fd-46f0-995e-0667daf16156" aop-wiki-id="65536"/>
    <key-event-reference id="9fe19e2d-88c2-4787-a32d-651d003d5bce" aop-wiki-id="65537"/>
</vendor-specific>
</data>

Environment (please complete the following information):

Crispae commented 3 months ago
prefixes:
  rml: "http://semweb.mmlab.be/ns/rml#"
  rr: "http://www.w3.org/ns/r2rml#"
  ql: "http://semweb.mmlab.be/ns/ql#"
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  schema: "http://schema.org/"
  dbo: "http://dbpedia.org/ontology/"
  aop: "http://aop.org/ontology/"
  dc: "http://purl.org/dc/elements/1.1/"

sources:
  source5:
    access: "source_4.xml"
    referenceFormulation: "xpath"
    iterator: "/data/key-event"

  source6:
    access: "source_4.xml"
    referenceFormulation: "xpath"
    iterator: "/data/vendor-specific/key-event-reference"

mappings:
  KeyEventMapping:
    sources:
      - source5
      - source6
    subjects:
      - value: "http://www.example.com/keyEvent/$(@id)"
    predicateobjects:
      - [rdf:type, aop:keyEvent]
      - [dc:title, $(title)]

      - predicates: dc:identifier
        objects: $(@aop-wiki-id)
        condition:
          function: idlab-fn:equal
          parameters:
            - [grel:valueParameter, $(@id)]
            - [grel:valueParameter2, $(@id)]  

This mapping works fine on maety, but on morph-kgc it gives following error:


KeyError                                  Traceback (most recent call last)
Cell In[41], [line 2](vscode-notebook-cell:?execution_count=41&line=2)
      [1](vscode-notebook-cell:?execution_count=41&line=1) ## Materializing the graph
----> [2](vscode-notebook-cell:?execution_count=41&line=2) graph = morph_kgc.materialize(config)

File c:\Users\saurav\anaconda3\envs\biobricks\lib\site-packages\morph_kgc\__init__.py:65, in materialize(config, python_source)
     [64](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:64) def materialize(config, python_source=None):
---> [65](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:65)     triples = materialize_set(config, python_source)
     [67](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:67)     graph = Graph()
     [68](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:68)     if triples:

File c:\Users\saurav\anaconda3\envs\biobricks\lib\site-packages\morph_kgc\__init__.py:57, in materialize_set(config, python_source)
     [55](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:55)     triples = set()
     [56](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:56)     for mapping_group in mapping_groups:
---> [57](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:57)         triples.update(_materialize_mapping_group_to_set(mapping_group, rml_df, fnml_df, config, python_source))
     [59](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:59) logging.info(f'Number of triples generated in total: {len(triples)}.')
     [61](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/__init__.py:61) return triples

File c:\Users\saurav\anaconda3\envs\biobricks\lib\site-packages\morph_kgc\materializer.py:334, in _materialize_mapping_group_to_set(mapping_group_df, rml_df, fnml_df, config, python_source)
    [332](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/materializer.py:332) triples = set()
    [333](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/materializer.py:333) for i, rml_rule in mapping_group_df.iterrows():
--> [334](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/materializer.py:334)     data = _materialize_rml_rule(rml_rule, rml_df, fnml_df, config, python_source=python_source)
    [335](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/materializer.py:335)     triples.update(set(data['triple']))
    [337](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/materializer.py:337) return triples
...
--> [210](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/data_source/data_file.py:210)         data_value.append(e.attrib[attribute])
    [211](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/data_source/data_file.py:211)     data_record.append(data_value)
    [212](file:///C:/Users/saurav/anaconda3/envs/biobricks/lib/site-packages/morph_kgc/data_source/data_file.py:212) data_records.append(data_record)

KeyError: 'aop-wiki-id'
arenas-guerrero-julian commented 3 months ago

Hi @Crispae ,

The result that I get with Matey for your second mapping is:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ma: <http://www.w3.org/ns/ma-ont#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix schema: <http://schema.org/> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix aop: <http://aop.org/ontology/> .

<http://www.example.com/keyEvent/81d03269-a0fd-46f0-995e-0667daf16156> rdf:type "http://aop.org/ontology/keyEvent" ;
    dc:title "Non-coding RNA expression profile alteration" .

<http://www.example.com/keyEvent/9fe19e2d-88c2-4787-a32d-651d003d5bce> rdf:type "http://aop.org/ontology/keyEvent" ;
    dc:title "Chronic obstructive pulmonary disease" .

Which does not seem to correspond with the output that you want to get, since there are no triples with the dc:identifier predicate.

Crispae commented 3 months ago
prefixes:
  rml: "http://semweb.mmlab.be/ns/rml#"
  rr: "http://www.w3.org/ns/r2rml#"
  ql: "http://semweb.mmlab.be/ns/ql#"
  rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  schema: "http://schema.org/"
  dbo: "http://dbpedia.org/ontology/"
  aop: "http://aop.org/ontology/"
  dc: "http://purl.org/dc/elements/1.1/"
  grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"
  idlab-fn: "http://example.com/idlab/function/"

sources:
  source5:
    access: "source_4.xml"
    referenceFormulation: "xpath"
    iterator: "/data/key-event"

  source6:
    access: "source_4.xml"
    referenceFormulation: "xpath"
    iterator: "/data/vendor-specific/key-event-reference"

mappings:
  KeyEventMapping:
    sources:
      - source5
      - source6
    subjects:
      - value: "http://www.example.com/keyEvent/$(@id)"
    predicateobjects:
      - [rdf:type, aop:keyEvent]
      - [dc:title, $(title)]

      - predicates: dc:identifier
        objects: $(@aop-wiki-id)
        condition:
          function: idlab-fn:equal
          parameters:
            - [grel:valueParameter, $(@id)]
            - [grel:valueParameter2, $(@id)]  

This is the updated config, it should work

arenas-guerrero-julian commented 3 months ago

Right, the output that I get is:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ma: <http://www.w3.org/ns/ma-ont#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix schema: <http://schema.org/> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix aop: <http://aop.org/ontology/> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix idlab-fn: <http://example.com/idlab/function/> .

<http://www.example.com/keyEvent/81d03269-a0fd-46f0-995e-0667daf16156> rdf:type "http://aop.org/ontology/keyEvent" ;
    dc:title "Non-coding RNA expression profile alteration" ;
    dc:identifier "65536" .

<http://www.example.com/keyEvent/9fe19e2d-88c2-4787-a32d-651d003d5bce> rdf:type "http://aop.org/ontology/keyEvent" ;
    dc:title "Chronic obstructive pulmonary disease" ;
    dc:identifier "65537" .

Which seems correct. The type of condition corresponds to the one defined in YARRRML Chapter 11. This is a to-do and is already reported in #267, we keep track over there and can use this example for testing.