herminiogg / ShExML

A heterogeneous data mapping language based on Shape Expressions
http://shexml.herminiogarcia.com
MIT License
15 stars 2 forks source link

Using a blanknode or UUID in ShExML #144

Closed andrawaag closed 1 year ago

andrawaag commented 1 year ago

In ShExML one has to add an identifier when linking shapes as shown in https://shexml.herminiogarcia.com/spec/#linking-shapes.

However, I want to use blank nodes, or be able to us identifier that is build by generating a UUID when the RDF is generated. Is this possible?

herminiogg commented 1 year ago

Hi Andra,

Sorry for not describing this possibility in the specification. At some point I added some features and functionalities but not all of them are totally well described in the specification. Indeed, it is possible to use blank nodes in ShExML using the prefixed notation _: in the subject generation expression. Following the example given in the specification to which you refer you can adapt it as shown below:

:Films :[films.id] {
    :name [films.name] ;
    :year :[films.year] ;
    :country [films.country] ;
    :director [films.directors] ;
    :cast @:Actor ;
}

:Actor _:[films.actors.id] {
    :name [films.actors.name] ;
}

You can also see it working on this playground example. In order to be able to link the object and the subject, the identifier has to match in the RDF generation phase even though then this is transformed in a bNode. If you want to generate artificial ids you can use the autoincrement ids explained here https://shexml.herminiogarcia.com/spec/#autoincrement-ids or if you need something more flexible you can also implement your own id generation strategy using the custom functions extension system.

I hope this can solve your doubts and do not hesitate to open more issues should you have further ones. I will leave this issue open until I include this information in the spec.

Best, Herminio

andrawaag commented 1 year ago

I have tried writing a helper function to generate a UUID for IDs, but can't get it to work. I used the example from the documentation and wrote the following helper function:


class Helper {

    def generateUUID(): UUID = {
      UUID.randomUUID()
    }
}

However, this leads to the following error message: java.lang.NullPointerException: Cannot invoke org.antlr.v4.runtime.tree.ParseTree.accept(org.antlr.v4.runtime.tree.ParseTreeVisitor)" because "tree" is null

herminiogg commented 1 year ago

Hi Andra,

Most probably you just have missed something in the mapping rules. I have adapted an example from the test suite to include your function as you can see below:

PREFIX : <http://example.com/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX schema: <http://schema.org/>
SOURCE films_xml_file <https://shexml.herminiogarcia.com/files/films.xml>
SOURCE films_json_file <https://shexml.herminiogarcia.com/files/films.json>
FUNCTIONS helper <scala: file:///C:/Users/Herminio/Downloads/testShExML/functions.scala>
ITERATOR film_xml <xpath: //film> {
    FIELD id <@id>
    FIELD name <name>
    FIELD year <year>
    FIELD country <country>
    FIELD directors <crew/directors/director>
    FIELD screenwritters <crew//screenwritter>
    FIELD music <crew/music>
    FIELD photography <crew/photography>
}
ITERATOR film_json <jsonpath: $.films[*]> {
    PUSHED_FIELD id <id>
    FIELD name <name>
    FIELD year <year>
    FIELD country <country>
    FIELD directors <crew.director>
    FIELD screenwritters <crew.screenwritter>
    FIELD music <crew.music>
    FIELD photography <crew.cinematography>
}
EXPRESSION films <films_xml_file.film_xml UNION films_json_file.film_json>

:Films :[helper.generateUUID(films.id)] {
    :name [helper.allCapitals(films.name)] ;
    :year [helper.addOne(films.year)] ;
    :countryOfOrigin dbr:[films.country] ;
    :director dbr:[films.directors] ;
    :screenwritter dbr:[films.screenwritters] ;
    :screenwritterName [helper.getName(films.screenwritters)] ;
    :titleYear [helper.nameAndYear(films.name, films.year)] ;
    :musicBy dbr:[films.music] ;
    :cinematographer dbr:[films.photography] ;
}

The code for the functions.scala file is as follows:

class Helper {

  def allCapitals(input: String): String = {
    input.toUpperCase
  }

  def addOne(number: Int): Int = {
    number + 1
  }

  def getName(str: String): String = {
    str.trim.split(" ", 2)(0)
  }

  def nameAndYear(name: String, year: Int): String = {
    name + year.toString
  }

  def generateUUID(id: String): java.util.UUID = {
    java.util.UUID.randomUUID()
  }
}

As you can see, there are two very specific things in this example. First the UUID class has to be mentioned using the whole package name as it is not imported anywhere else. In addition, due to an error in my design (or something I was not expecting during the design) we need to pass a dummy argument to the function, even though we are not going to use it at all. If you run this in the CLI you will get something like this:

:0fa9a120-63b4-4f48-80de-bf8151784695
        :cinematographer    dbr:Hoyte_van_Hoytema ;
        :countryOfOrigin    dbr:USA ;
        :director           dbr:Christopher_Nolan ;
        :musicBy            dbr:Hans_Zimmer ;
        :name               "INTERSTELLAR" ;
        :screenwritter      dbr:Christopher_Nolan , dbr:Jonathan_Nolan ;
        :screenwritterName  "Christopher" , "Jonathan" ;
        :titleYear          "Interstellar2014" ;
        :year               2015 .

:f9afeb71-23a0-4e92-94a7-1f55acd6e3b4
        :countryOfOrigin    dbr:USA ;
        :director           dbr:Christopher_Nolan ;
        :musicBy            dbr:Hans_Zimmer ;
        :name               "INCEPTION" ;
        :screenwritter      dbr:Christopher_Nolan ;
        :screenwritterName  "Christopher" ;
        :titleYear          "Inception2010" ;
        :year               2011 .

:27ab1f54-ec5c-486a-8fbf-32a12fec1869
        :cinematographer    dbr:Hoyte_van_Hoytema ;
        :countryOfOrigin    dbr:USA ;
        :director           dbr:Christopher_Nolan ;
        :musicBy            dbr:Hans_Zimmer ;
        :name               "DUNKIRK" ;
        :screenwritter      dbr:Christopher_Nolan ;
        :screenwritterName  "Christopher" ;
        :titleYear          "Dunkirk2017" ;
        :year               2018 .

:7b1aeacd-8d8f-4062-a162-f010dc4273cc
        :countryOfOrigin    dbr:USA ;
        :director           dbr:Christopher_Nolan ;
        :musicBy            dbr:David_Julyan ;
        :name               "THE PRESTIGE" ;
        :screenwritter      dbr:Jonathan_Nolan , dbr:Christopher_Nolan ;
        :screenwritterName  "Christopher" , "Jonathan" ;
        :titleYear          "The Prestige2006" ;
        :year               2007 .

I will maintain this issue open to examine why functions without arguments cannot be executed with the current implementation.

Best, Herminio

andrawaag commented 1 year ago

Works like a charm