opencaesar / owl-tools

A set of analysis tools for OWL
Apache License 2.0
5 stars 1 forks source link

Owl query issue #42

Closed joegregoryphd closed 1 year ago

joegregoryphd commented 1 year ago

Description

When running the 'owlQuery' task against a valid set of vocabularies and descriptions, valid sparql queries were being run but producing an empty JSON file saved in the 'results' folder.

Steps to Reproduce

Steps to reproduce the behavior: Produce a valid vocabulary and description (and bundles). Produce a valid sparql query and save in the /oml/sparql folder. Build project, Start Fuseki, and run the 'owlQuery' task. This may work on the first attempt (i.e. a set of results produced as a JSON file in 'results' folder). Delete the JSON file in the 'results' folder and run again. This time, the resulting JSON file is empty.

Expected Behavior

The resulting JSON file should contain the complete set of results corresponding to the query.

Additional Context

I discussed this with Maged on Friday 17th Feb, and we figured out the issue. Despite the Fuseki server being stopped and restarted, the 'buildLog' remained in the project. Thus, the 'owlLoad' task assumed that the descriptions had been loaded into the Fuseki server. On running the 'owlQuery', therefore, the descriptions were not loaded into Fuseki. The queries were run against an empty set of descriptions. Deleting the 'buildLog' each time resolved this issue.

melaasar commented 1 year ago

After debugging this with you, I think before you ran owlQuery the second time, you ran stopFuseki and startFuseki, i.e., you had a brand new instance of Fuseki that does not have the dataset loaded in it. In this case, both owlQuery and its dependency owlLoad tasks were determined to be UP-To-DATE and did not rerun.

The root issue here is that a) owlLoad task is unaware that the Fuseki server was restarted since the last run of the task, and b) the default configuration for Fuseki is to have an in-memory (not persistent) dataset. Specifically, you will find the following in the .fuseki.ttl file (at the root):

## In memory TDB with union graph.
<#dataset> rdf:type   tdb:DatasetTDB ;
  tdb:location "--mem--" ; ## <----------------------------- Not persistant
  # Query timeout on this dataset (1s, 1000 milliseconds)
  ja:context [ ja:cxtName "arq:queryTimeout" ; ja:cxtValue "1000" ] ;
  # Make the default graph be the union of all named graphs.
  tdb:unionDefaultGraph true .

This mean when Fuseki restarts, it loses its previous state (when owlLoad has loaded the dataset to it).

A simple fix to address this problem is to configure owlLoad to rerun when it detects that the startFuseki task ran a new server. You can do this by adding the following line to the beginning of owlLoad

/*
 * A task to load an OWL catalog to a Fuseki dataset endpoint
 */
task owlLoad(type:io.opencaesar.owl.load.OwlLoadTask, group:"oml", dependsOn: owlReason) {
    inputs.files(startFuseki.outputFolderPath) // rerun when Fuseki (with memory dataset) restarts
    catalogPath = file('build/owl/catalog.xml')
    ....
}

I will create a patch to Rosetta to add such line by default for newly created OML projects from the wizard.

Please test this and feel free to reopen the ticket if it did not address the issue.