openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
867 stars 210 forks source link

Recursive sponging stops at depth 1 #559

Open arnikz opened 8 years ago

arnikz commented 8 years ago

Hi,

I'm using Virtuoso v7.2.2 to "sponge" the content of a RESTful Web service that returns (related) RDF graphs from several endpoints:

baseURL/path1 -> returns RDF graph1 that contains a link (predicate) to baseURL/path2 baseURL/path2 -> returns RDF graph2 that contains a link to baseURL/path3 etc.

Here is the code to fetch the RDF graphs, which follows the links via the specified predicate (rdfs:seeAlso); however, the procedure seems to stop at depth 1 as only the first two RDF graphs end up in the quad-store.

SPARQL
define get:soft "soft"
define get:method "GET"
define input:grab-depth 3
define input:grab-seealso <rdfs:seeAlso>
SELECT * FROM <baseURL/path1> WHERE { ?s ?p ?o }

I've also tried the input:grab-follow-predicate pragma (synonymous to input:grab-seealso) but no success. Do I miss something? Thanks.

HughWilliams commented 8 years ago

@arnikz: I assume you have installed the Sponger Cartridges i.e. rdf_mappers_dav.vad package ?

arnikz commented 8 years ago

@HughWilliams: Hi, thanks. Actually, I've got the Sponger Cartridges installed:

SQL> vad_list_packages ();
...
cartridges  Linked Data Cartridges  1.99_git745  2016-02-25 18:12  2016-04-08 16:01

but can't find _rdf_mappersdav.vad package name. Is this different from _cartridgesdav.vad? Note that my Web Service already outputs serialized RDF graph so AFAIK there is no need to map the entries to ontologies etc. What do you suggest?

HughWilliams commented 8 years ago

@arnikz: rdf_mappers_dav.vad is the original and still open source name for the sponger cartridges ... there is some divergence in the commercial builds where it is now called cartridges_dav.vad to differentiate, thus where did u obtain it from ?

I do believed the two are to merged or made compatible but don't think that has been done yet.

You can compile the rdf_mappers_dav.vad if you build yourself and enable the --enable-rdfmappers-vad configure option, or it is prebuilt and available for download from:

http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSDownload#Other Virtuoso-related Packages

arnikz commented 8 years ago

@HughWilliams: I've got the _cartridgesdav.vad package from http://virtuoso.openlinksw.com/download/#Release72VADPackages (Linked Open Data Transformation Middleware ("Sponger") ). I've tried to install the prebuilt _rdf_mappersdav.vad but got the following error (also after removing the transaction file & restarting the server; and uninstalling _cartridgesdav.vad):

00000 Errors detected
00000 Installation of "RDF Mappers" was unsuccessful.
      The installation of this VAD package has failed.
      Please delete the transaction file /usr/local/bin/virtuoso/7/share/virtuoso/vad//usr/local/bin/virtuoso/7/var/lib/virtuoso/db/virtuoso.trx
      and then restart your database server.
      Note: Your database will be in its pre VAD installation state after you restart.
00000 FATAL

I'll build the package myself and try it out as you suggested.

HughWilliams commented 8 years ago

@arnikz: Note you probably should uninstalled the cartridges_dav.vad package first also ... or try with an empty database to start off with to prove the rdf_mappers_dav.vad package itself is good ...

arnikz commented 8 years ago

@HughWilliams: I've compiled & installed the _rdf_mappersdav.vad package from sources (also uninstalled _cartridgesdav.vad) and started with an empty database.

SQL> vad_list_packages();
...
rdf_mappers  RDF Mappers  1.34.74  ...

The "sponging" problem still remains the same: only the first two RDF graphs from my example are fetched and the process does not continue further down the hierarchy (i.e., depth > 1; baseURL/path3...).

SQL> SPARQL SELECT DISTINCT ?g { GRAPH ?g { ?s ?p ?o. FILTER REGEX(?g, 'baseURL') } };
...
baseURL/path1
baseURL/path2

2 Rows. -- 662 msec.
HughWilliams commented 8 years ago

@arnikz I have assigned this issue to my colleague @mjovanovik who works on sponger and can assist in determine cause of the issue ...

arnikz commented 8 years ago

@HughWilliams @mjovanovik I was wondering which source file (e.g., related to the SPARQL processor) could use to debug.

mjovanovik commented 8 years ago

@arnikz: Hi, quickly looking at the code in your original message, I see that you are using:

define input:grab-seealso

when it should be

define input:grab-seealso rdfs:seeAlso

or

define input:grab-seealso http://www.w3.org/2000/01/rdf-schema#seeAlso

You see, rdfs:seeAlso is a qname, so no brackets are necessary. I found that in the documentation there's an example using define input:grab-seealso <foaf:maker>, but I suspect it to be wrong.

Can you try this out as a first step?

arnikz commented 8 years ago

Hi @mjovanovik, thanks for looking into this. I've tried as you've suggested: removing the <> worked further down the tree, however, no further than level 2! Moreover, I've tried different values for the input:grab-depth pragma but there was no effect on the traversal depth.

mjovanovik commented 8 years ago

Hi @arnikz, can I look at the specific RDF content you are working with? I suspect there might be issues in the RDF data which stop the process from continuing to the next levels.

arnikz commented 8 years ago

Hi @mjovanovik: Yes, sure. Is it ok with you if I send you the link privately as it can't be shared (yet) publiclly. Thanks.

mjovanovik commented 8 years ago

Hi @arnikz, yes, absolutely. My email is my-GitHub-username [at] openlinksw [dot] com.