johanwk / elot

Emacs Literate Ontology Tool
GNU General Public License v3.0
8 stars 2 forks source link

run SPARQL using Robot #27

Closed VladimirAlexiev closed 6 months ago

VladimirAlexiev commented 7 months ago

From #23

how to invoke ROBOT, including for running sparql queries on ontology files. The challenge is to make something that feels natural to use. sparql-mode is made for querying endpoints -- it doesn't support calling ROBOT, ARQ, or other programs to query files. Maybe we could do something to add that functionality. If that can be made to work, then ELOT users will only need to install ROBOT and rdfpuml for a very powerful set of ontology tools. (What does work is to use a #+call in the org buffer that invokes sparql with robot, but I think it's not very elegant.)

@johanwk Can you give a sample Robot command? I'm familiar with ARQ but not Robot.

VladimirAlexiev commented 7 months ago

Found the documentation: https://robot.obolibrary.org/query Posted an issue https://github.com/ljos/sparql-mode/issues/75, but many issues in that repo have stayed unanswered. So we may have to fork it.

It's also important to figure out how to do it from orgmode. That issue mentions the org headers for specifying SPARQL endpoint and return format; need similar for querying a file.

johanwk commented 7 months ago

What I have working so far looks like this (moved comment here, I accidentally put it in the sparql-mode issue ;) ):

#+name: myquery
#+begin_src sparql 
select * { ?x a owl:Class; rdfs:label ?z } 
limit 3
#+end_src

#+call: robot-sparql-select(omnfile="ABC.omn",query="myquery")

#+RESULTS:
| ?x                           | ?z         |
|------------------------------+------------|
| <http://example.org/MyClass> | "My class" |

But this isn't so great, since the user needs to C-c C-c the #+call instead of the sparql block.

Could we try to apply advice around the sparql-mode function -- and maybe use file:/// urls for files, so that the changes needed for sparql-mode would be as small as possible?

johanwk commented 7 months ago

I've introduced a function robot-command, which simply calls ROBOT with a string.

This is used with a hook, so that after the user tangles to OMN, ROBOT automatically translates the result into Turtle. That's obviously important, so that queries can be executed on the ontology.

It's in elot-defs.org. There's a lot going on here...

  (defgroup elot 
    nil
    "Customization group for ELOT")
  (defcustom elot-robot-jar-path (expand-file-name "~/bin/robot.jar")
    "Path to the robot.jar file."
    :group 'elot
    :version "29.2"
    :type 'string)
  (defvar elot-robot-command-str
    (concat "java -jar " elot-robot-jar-path))
  (defun elot-robot-command (cmd)
    (shell-command (concat elot-robot-command-str " " cmd)))
  (defun elot-robot-omn-to-ttl (omnfile)
    "Call ROBOT to make a Turtle file from `omnfile'."
    (cond
     ((not (file-exists-p elot-robot-jar-path))
      (message "ROBOT not found, not converting to Turtle"))
     ((not (file-exists-p omnfile))
      (message (concat omnfile " not found, nothing for ROBOT to convert")))
     (t (shell-command
         (concat elot-robot-command-str
                  " convert --strict --verbose"
                  " --input " omnfile
                  " --output " (file-name-sans-extension omnfile) ".ttl")))))
  (defun elot-tangled-omn-to-ttl ()
    "After tangling to OMN, call ROBOT to convert to Turtle."
    (let* ((omnfile (buffer-file-name))  ;; will run in the tangled buffer
           (omn-p (string-match-p ".omn$" omnfile)))
      (if omn-p
          (elot-robot-omn-to-ttl omnfile))))

The hook gets activated in elot-defaults.el:

    (add-hook 'org-babel-post-tangle-hook 
              'elot-tangled-omn-to-ttl
              'local) ;; make it a local hook only
johanwk commented 7 months ago

For reading the output from ROBOT into the org-mode document, I added the following. The main point is to use with-temp-buffer. ROBOT doesn't output to standard out, only to a file -- so I guess one can easily end up with a lot of files, it's better to put them in a temporary folder and out of sight. Tangled from elot-defs.org:

  (defun elot-tsv-to-table (filename)
    (let* ((lines (with-temp-buffer
                   (insert-file-contents filename)
                   (split-string (buffer-string) "\n")))
           (header (split-string (car lines) "\t"))
           (body (mapcar
                  (lambda (line) (split-string line "\t"))
                  (butlast (cdr lines)))))  ;; check this is ok
      (cons header (cons 'hline body))))

The org-babel block that can be used in #+call: to execute a query looks like this. Note, there was a typo in the previous commit ("csv" instead of "tsv"). Contained in elot-lob.org.

#+name: robot-sparql-select
#+begin_src emacs-lisp :var omnfile="pizza.omn" query="myquery"
  (let* ((query-file
          (concat (org-babel-temp-directory) "/"
                  query
                  ".sparql"))
         (result-file (concat (file-name-sans-extension omnfile) ".tsv"))
         (qryblock (org-babel-lob--src-info query))
         (qrytext (cadr qryblock)))
    (setcar (nthcdr 1 qryblock)
            (concat "<<sparql-prefixes()>>\n" qrytext))
    (let ((prefixedquery
           (org-babel-expand-noweb-references qryblock)))
      (with-temp-file query-file (insert prefixedquery)))
    (elot-robot-command
     (concat "query --input " omnfile
             " --format TSV"
             " --query " query-file
             " " result-file))
    (elot-tsv-to-table result-file))
#+end_src
johanwk commented 7 months ago

.... but elot-tsv-to-table is not a good solution. It would be much better to have sparql-mode handle the result of sparql queries.

johanwk commented 7 months ago

Update. by amending org-babel-execute:sparql to call on ROBOT, this works out. Commit coming soon..

johanwk commented 7 months ago

Commit b37ca47 adds function elot-robot-execute-query and a patched version of org-babel-execute:sparql that will call ROBOT if the :url header argument to a sparql block looks like a filename.

This works with both select and construct queries.

  1. The default is to select for csv values. Here is an example query block, with header :url "ABC.omn" pointing to a local OMN file:
    
    #+begin_src sparql :url "ABC.omn"
    select * { ?x a owl:Class; rdfs:label ?z } 
    limit 1
    #+end_src

+RESULTS:

| x | z | |-------------------------------+-------------| | http://example.org/OtherClass | Other Class |

2. For `construct` queries, add `:wrap "src ttl" :format ttl` to the header. The `:wrap` puts the output in a Turtle block.

+begin_src sparql :url "ABC.omn" :wrap "src ttl" :format ttl

construct {?x ex:r ?z} { ?x a owl:Class; rdfs:label ?z } limit 1

+end_src

+RESULTS:

+begin_src ttl

@prefix pav: http://purl.org/pav/ . @prefix owl: http://www.w3.org/2002/07/owl# . @prefix xsd: http://www.w3.org/2001/XMLSchema# . @prefix skos: http://www.w3.org/2004/02/skos/core# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix ex: http://example.org/ . @prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# . @prefix xml: http://www.w3.org/XML/1998/namespace . @prefix dcterms: http://purl.org/dc/terms/ . @prefix iof-av: https://spec.industrialontologies.org/ontology/core/meta/AnnotationVocabulary/ . @prefix prov: http://www.w3.org/ns/prov# . @prefix exo: http://example.org/ont/tralala/ . @prefix foaf: http://xmlns.com/foaf/0.1/ . @prefix dc: http://purl.org/dc/elements/1.1/ .

ex:OtherClass ex:r "Other Class" .

+end_src

johanwk commented 7 months ago

Note that ROBOT provides various options for executing queries that aren't supported by this initial commit b37ca47.

ROBOT options that may be added later:

johanwk commented 7 months ago

Tested on MacOS: it works.

johanwk commented 7 months ago

See https://github.com/ljos/sparql-mode/issues/75 on how sparql-mode might be improved, so local ELOT hacks can be dropped.

johanwk commented 6 months ago

Closing, since this works OK