morph-kgc / morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings
https://morph-kgc.readthedocs.io
Apache License 2.0
187 stars 33 forks source link

YARRRML - Exception while using grel:string_indexOf function #259

Closed david-martinez-garcia closed 3 months ago

david-martinez-garcia commented 4 months ago

Hello,

I am working on a case study where I need to do some processing to generate RDF subjects. More precisely, I have the following JSON data source:

{
    "people": [
        {
            "name": "John_Doe"
        },
        {
            "name": "Jane_Smith"
        },
        {
            "name": "Sara_Bladinck"
        }
    ]
}

I have to generate the subjects using the name element value, but I must only use the left part of it when splitting by the _ character. The expected output should be the following:

ex:John rdf:type ex:Person .
ex:Jane rdf:type ex:Person .
ex:Sara rdf:type ex:Person .

I have written the following mappings file:

prefixes:
 ex: "http://example.com/"
 grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"

mappings:
  person:
    sources:
      - ['people.json~jsonpath', '$.people[*]']
    s:
      function: grel:array_join
      parameters:
        - parameter: grel:p_array_a
          value: "http://example.com/"
        - parameter: grel:p_array_a
          value:
            function: grel:string_substring
            parameters:
              - [grel:valueParameter, $(name)]
              - [grel:param_int_i_from, "0"]
              - parameter: grel:param_int_i_opt_to
                value:
                  function: grel:string_indexOf
                  parameters:
                    - [grel:valueParameter, $(name)]
                    - [grel:string_sub, "_"]
    po:
      - [a, ex:Person]

With Matey this works as expected, but with Morph it doesn't. I am getting these exceptions:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/materializer.py", line 428, in _materialize_mapping_group_to_file
    data = _materialize_rml_rule(rml_rule, rml_df, fnml_df, config)
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/materializer.py", line 391, in _materialize_rml_rule
    data = _materialize_rml_rule_terms(data, rml_rule, fnml_df, config)
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/materializer.py", line 274, in _materialize_rml_rule_terms
    results_df = _materialize_fnml_execution(results_df, rml_rule['subject_map_value'], fnml_df, config, 'subject', termtype=rml_rule['subject_termtype'])
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/materializer.py", line 190, in _materialize_fnml_execution
    results_df = execute_fnml(results_df, fnml_df, fnml_execution, config)
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/fnml/fnml_executer.py", line 76, in execute_fnml
    data = execute_fnml(data, fnml_df, execution_rule['value_map_value'], config)
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/fnml/fnml_executer.py", line 76, in execute_fnml
    data = execute_fnml(data, fnml_df, execution_rule['value_map_value'], config)
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/fnml/fnml_executer.py", line 89, in execute_fnml
    function = udf_dict[function_id]['function']
KeyError: 'http://users.ugent.be/~bjdmeest/function/grel.ttl#string_indexOf'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/user/.local/lib/python3.10/site-packages/morph_kgc/__main__.py", line 47, in <module>
    num_triples = sum(pool.starmap(_materialize_mapping_group_to_file,
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
KeyError: 'http://users.ugent.be/~bjdmeest/function/grel.ttl#string_indexOf'

Do you have any idea about what might be wrong? Thanks a lot.

arenas-guerrero-julian commented 4 months ago

Hi @david-martinez-garcia ,

grel:string_indexOf is not implemented in morph-kgc. Here you can check the list of built in functions.

Could you implement grel:string_indexOf and contribute a PR?

david-martinez-garcia commented 4 months ago

Sure @arenas-guerrero-julian, I'll have a look at it. I also see there are other functions that are not implemented yet. The unexperienced me in the RML-FNML world thought that all these GREL functions were "there" for anybody to use them. I didn't know they must be implemented - I guess the list of functions defined in the grel.ttl file serves as an interface. Thanks for your help.

arenas-guerrero-julian commented 4 months ago

Thanks. Your are right, ideally all GREL functions would be implemented as built-in functions, but currently we just support a subset of them. I think that taking as reference other GREL function that are already implemented, adding new ones is simple 🙂.