openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
855 stars 211 forks source link

Can't retrieve a new inserted entry's label using bif:contains on Virtuoso 7 #307

Open fernandoferreira-me opened 9 years ago

fernandoferreira-me commented 9 years ago

I've just inserted a node on a DBPedia graph with this query:

INSERT DATA {
    GRAPH <http://dbpedia.org> {         
     <http://my.semantics/resources/California%20Assn.%20Of%20Realtors> a dbpedia-owl:Organisation;
                                                                        rdfs:label  'California Assn. Of Realtors'@en .
}}

And it works fine. When I try:

SELECT ?label
WHERE {
<http://semantics.twist.systems/resources/California%20Assn.%20Of%20Realtors> rdfs:label ?label . }

The result is retrived: California Assn. Of Realtors.

However whenever I try something like:

select ?label
where
{<http://semantics.twist.systems/resources/California%20Assn.%20Of%20Realtors> rdfs:label ?label .
 ?label bif:contains 'California' .}

Nothing is returned.

I guess it is somehow related to the way virtuoso reindexes the triples. I've even tried to run DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ (); as suggested by https://github.com/openlink/virtuoso-opensource/issues/107 but that didn't help.

HughWilliams commented 9 years ago

@fguimara: The subject value inserted http://my.semantics/resources/California%20Assn.%20Of%20Realtors is different to the subject value used in the queries http://semantics.twist.systems/resources/California%20Assn.%20Of%20Realtors ,thus neither query returned and results. But when I change the queries to use the correct subject value for the data inserted, both return results:

SQL> SPARQL PREFIX dbpedia-owl: http://dbpedia.org/ontology/ INSERT DATA { GRAPH http://dbpedia.org { http://my.semantics/resources/California%20Assn.%20Of%20Realtors a dbpedia-owl:Organisation; rdfs:label 'California Assn. Of Realtors'@en . }};

Done. -- 6 msec. SQL> SQL> SPARQL PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT ?label FROM http://dbpedia.org WHERE { http://my.semantics/resources/California%20Assn.%20Of%20Realtors rdfs:label ?label . }; label VARCHAR


California Assn. Of Realtors

1 Rows. -- 0 msec. SQL> SPARQL PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT ?label FROM http://dbpedia.org WHERE { http://my.semantics/resources/California%20Assn.%20Of%20Realtors rdfs:label ?label . ?label bif:contains 'California'. }; label VARCHAR


California Assn. Of Realtors

1 Rows. -- 0 msec. SQL> status(''); REPORT VARCHAR


OpenLink Virtuoso VDB Server Version 07.10.3211-pthreads for Linux as of Jan 22 2015 Registered to OpenLink Virtuoso (Internal Use) (Personal Edition, 500 connections) Started on: 2015-01-22 22:38 GMT+1

Note I am using a build from the latest git develop/7 branch ...

fernandoferreira-me commented 9 years ago

Ok, first of all... sorry by copying and pasting the commands wrongly. Actually, I've never tried different URIs. So the problem was not that. This is what happens when I try the exactly same thing you did.

SQL> SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> INSERT DATA { GRAPH <http://dbpedia.org> { <http://my.semantics/resources/California%20Assn.%20Of%20Realtors> a dbpedia-owl:Organisation; rdfs:label 'California Assn. Of Realtors'@en . }};
Connected to OpenLink Virtuoso
Driver: 07.00.3203 OpenLink Virtuoso ODBC Driver

Done. -- 389 msec.
SQL> SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> SELECT ?label FROM      <http://dbpedia.org> WHERE { <http://my.semantics/resources/California%20Assn.%20Of%20Realtors> rdfs:label ?label . };
label
LONG VARCHAR

California Assn. Of Realtors

1 Rows. -- 275 msec.

SQL> SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> SELECT ?label FROM <http://dbpedia.org> WHERE { <http://my.semantics/resources/California%20Assn.%20Of%20Realtors> rdfs:label ?label . ?label bif:contains 'California'. };
label
LONG VARCHAR

0 Rows. -- 331 msec.

My version is the

OpenLink Virtuoso Server Version 07.00.3203-pthreads for Linux as of Jan 9 2015 Started on: 2015-01-21 15:44 GMT-2

Is that a bug?

HughWilliams commented 9 years ago

@fguimara: You are running a 07.00.3203 build from back in 2013 and I am running the latest 07.10.3211 develop/7 branch build from a few days ago. So I would suggest you update your build to develop/7 at:

https://github.com/openlink/virtuoso-opensource

lauramoraes commented 9 years ago

@HughWilliams

I work with @fguimara and here is step by step what we did, from installation to query. It still doesn't work...

virtuoso@virtuoso:~$ git clone https://github.com/openlink/virtuoso-opensource.git
virtuoso@virtuoso:~/virtuoso-opensource$ git branch
* develop/7
virtuoso@virtuoso:~/virtuoso-opensource$ ./autogen.sh
virtuoso@virtuoso:~/virtuoso-opensource$ export CFLAGS="-O2 -m64"
virtuoso@virtuoso:~/virtuoso-opensource$ ./configure --with-layout=debian --enable-dbpedia-vad --enable-rdfmappers-vad --prefix=/opt/virtuoso
virtuoso@virtuoso:~/virtuoso-opensource$ make -j3
virtuoso@virtuoso:~/virtuoso-opensource$ sudo make install
virtuoso@virtuoso:~/virtuoso-opensource$ cd /opt/virtuoso/var/lib/virtuoso/db
virtuoso@virtuoso:/opt/virtuoso/var/lib/virtuoso/db$ sudo ../../../../bin/virtuoso-t -f &
[1] 15498
virtuoso@virtuoso:/opt/virtuoso/var/lib/virtuoso/db$ ../../../../bin/isql
SQL> ld_add('/opt/virtuoso/var/lib/virtuoso/dbpedia/dbpedia_2014.owl', 'http://dbpedia.org/ontology/');
Connected to OpenLink Virtuoso
Driver: 07.10.3211 OpenLink Virtuoso ODBC Driver

Done. -- 1 msec.
SQL> select ll_file, ll_graph, ll_state from db.dba.load_list;
ll_file                                                                           ll_graph                                                                          ll_state
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER
_______________________________________________________________________________

/opt/virtuoso/var/lib/virtuoso/dbpedia/dbpedia_2014.owl                           http://dbpedia.org/ontology                                                       0

1 Rows. -- 2 msec.
SQL> checkpoint;
23:48:00 Checkpoint started
23:48:00 Checkpoint finished, log reused

Done. -- 61 msec.
SQL> commit WORK;

Done. -- 1 msec.
SQL> checkpoint;
23:48:10 Checkpoint started
23:48:10 Checkpoint finished, log reused

Done. -- 71 msec.

SQL> SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> INSERT DATA { GRAPH <http://dbpedia.org> { <http://my.semantics/resources/California_Assn._Of_Realtors> a dbpedia-owl:Organisation; rdfs:label 'California Assn. Of Realtors'@en . }};

Done. -- 7 msec.
SQL> SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> SELECT ?label FROM <http://dbpedia.org> WHERE { <http://my.semantics/resources/California_Assn._Of_Realtors> rdfs:label ?label . };
label
LONG VARCHAR
_______________________________________________________________________________

California Assn. Of Realtors

1 Rows. -- 11 msec.
SQL> SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> SELECT ?label FROM <http://dbpedia.org> WHERE { <http://my.semantics/resources/California_Assn._Of_Realtors> rdfs:label ?label . ?label bif:contains 'California'. };
label
LONG VARCHAR
_______________________________________________________________________________

0 Rows. -- 5 msec.
SQL> status('');
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.10.3211-pthreads for Linux as of Jan 27 2015 
Started on: 2015-01-27 23:38 GMT-2
HughWilliams commented 9 years ago

@fguimara @lauramoraes : Your use of the bif:contains function is search for an exact match for "California" and nothing else to search for a string contain starting with the word California then you need use the wild card character enclosed on double and single quotes i.e. "'California*'" as follows:

SPARQL PREFIX dbpedia-owl: http://dbpedia.org/ontology/ SELECT ?label FROM http://dbpedia.org WHERE { http://my.semantics/resources/California_Assn._Of_Realtors rdfs:label ?label . ?label bif:contains "'California*'". };

See the example in the documentation which includes such an example at:

http://docs.openlinksw.com/virtuoso/sparqlextensions.html#rdfsparqlrulefulltext

lauramoraes commented 9 years ago

Hi @HughWilliams , I am not sure I understand your last comment. In your example it clearly works WITHOUT the wild card. Actually SPARQL returns an error if I try using a wild card:

SQL> SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology> SELECT ?label FROM <http://dbpedia.org> WHERE { <http://my.semantics/resources/California_Assn._Of_Realtors> rdfs:label ?label . ?label bif:contains 'California*'. };

*** Error 37000: [Virtuoso Driver][Virtuoso Server]XM029: Free-text expression, line 0: Invalid character in free-text search expression, it may not appear outside quoted string at *
at line 12 of Top-Level:
SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology> SELECT ?label FROM <http://dbpedia.org> WHERE { <http://my.semantics/resources/California_Assn._Of_Realtors> rdfs:label ?label . ?label bif:contains 'California*'. }
lauramoraes commented 9 years ago

Making some more tests we discovered that the problem is in index time.

If I add another tuple and try to recover it, it returns nothing:

SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology> INSERT DATA { GRAPH <http://dbpedia.org> { <http://my.semantics/resources/Welcome_to_Miami> a dbpedia-owl:Organisation; rdfs:label 'Welcome to Miami'@en . }};

SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology> SELECT ?label FROM <http://dbpedia.org> WHERE { <http://my.semantics/resources/Welcome_to_Miami> rdfs:label ?label . ?label bif:contains 'Miami'. };

label
LONG VARCHAR
_______________________________________________________________________________

0 Rows. -- 3 msec.

However, I discovered that the not indexed tuple enters in a queue:

select * from db.dba.vtlog_db_dba_rdf_obj;
VTLOG_RO_ID          SNAPTIME             DMLTYPE  VT_GZ_WORDUMP                                                                     VT_OFFBAND_DATA
INTEGER NOT NULL     TIMESTAMP            VARCHAR  LONG VARBINARY                                                                    LONG VARCHAR
_______________________________________________________________________________

1052                 2015.1.29 15:14.28 678751000  I        NULL                                                                              NULL
1053                 2015.1.29 16:8.47 457195000  I        NULL                                                                              NULL

2 Rows. -- 3 msec.

According to http://shadok.enst.fr:8890/doc/html/fn_vt_batch_update.html, if I set this function to OFF, the tuples are supposed to index at insertion time and not enter in the queue anymore. So, I index what is left in the queue and set the function to off.

DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ ();
Done. -- 8 msec.

select * from db.dba.vtlog_db_dba_rdf_obj;
VTLOG_RO_ID          SNAPTIME             DMLTYPE  VT_GZ_WORDUMP                                                                     VT_OFFBAND_DATA
INTEGER NOT NULL     TIMESTAMP            VARCHAR  LONG VARBINARY                                                                    LONG VARCHAR
_______________________________________________________________________________

0 Rows. -- 2 msec.

SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology> SELECT ?label FROM <http://dbpedia.org> WHERE { <http://my.semantics/resources/Welcome_to_Miami> rdfs:label ?label . ?label bif:contains 'Miami'. };
label
LONG VARCHAR
_______________________________________________________________________________

Welcome to Miami

 DB.DBA.VT_BATCH_UPDATE ('DB.DBA.RDF_OBJ', 'OFF', null);
Done. -- 4 msec.

However, when I make a new insertion, the tuples keep going to the queue until I index them again:

SPARQL PREFIX dbpedia-owl: <http://dbpedia.org/ontology> INSERT DATA { GRAPH <http://dbpedia.org> { <http://my.semantics/resources/Welcome_to_Florida> a dbpedia-owl:Organisation; rdfs:label 'Welcome to Florida'@en . }};

select * from db.dba.vtlog_db_dba_rdf_obj;
VTLOG_RO_ID          SNAPTIME             DMLTYPE  VT_GZ_WORDUMP                                                                     VT_OFFBAND_DATA
INTEGER NOT NULL     TIMESTAMP            VARCHAR  LONG VARBINARY                                                                    LONG VARCHAR
_______________________________________________________________________________

1054                 2015.1.29 16:15.19 852771000  I        NULL                                                                              NULL

Why isn't the VT_BATCH_UPDATE function working? I already added the rule for indexing all graph and predicate:

ROFR_G                                                                            ROFR_P                                                                            ROFR_REASON
VARCHAR NOT NULL                                                                  VARCHAR NOT NULL                                                                  VARCHAR NOT NULL
_______________________________________________________________________________

                                                                                                                                                                    ALL
HughWilliams commented 9 years ago

@fguimara when executing the query you do not have the double and single quotes around the string being searched for as in my example i.e. bif:contains "'California*'"

susannamartinelli commented 9 years ago

Hi @lauramoraes and @HughWilliams I've tried your example too, but first i've set _fn_vt_batchupdate function to 'OFF'.

However even with

?label bif:contains "'Welcome*'" 

in the FILTER the result is always 0. I need to run

DB.DBA.VT_INC_INDEX_DB_DBA_RDF_OBJ ();

once inserted new triple, then I eventually get the results.

I'm running Virtuoso Version: 07.20.3212 on mac 10.10

HughWilliams commented 9 years ago

@susannamartinelli: When you start an empty Virtuoso database there should be a default scheduler event for the 'VT_INC_INDEX_DB_DBA_RDF_OBJ()' procedure to be invoked every minute to update the FT index:

SQL> select * from SYS_SCHEDULED_EVENT where SE_NAME='VT_INC_INDEX_DB_DBA_RDF_OBJ()'; SE_NAME SE_START SE_SQL SE_LAST_COMPLETED SE_INTERVAL SE_LAST_ERROR SE_ENABLE_NOTIFY SE_NOTIFY SE_NOTIFICATION_SENT VARCHAR NOT NULL TIMESTAMP VARCHAR TIMESTAMP INTEGER LONG VARCHAR INTEGER VARCHAR INTEGER


VT_INC_INDEX_DB_DBA_RDF_OBJ() 2015.3.21 0:36.25 359249000 "DB"."DBA"."VT_INC_INDEX_DB_DBA_RDF_OBJ"() 2015.3.21 20:48.9 401591000 1 NULL 0 NULL 0

1 Rows. -- 1 msec. SQL>

Does this not exist in your instance ?

susannamartinelli commented 9 years ago

Hi @HughWilliams, Thank you for help. Actually when i run your query i get this: schermata 2015-03-22 alle 14 13 42

HughWilliams commented 9 years ago

@susannamartinelli: What is the version of Virtuoso being used ? Please provide the output of running the command:

./virtuoso-t -?

as running via the isql command line tool or the Conductor isql UI I get the expected 1 row returned, without any errors. It does not make sense as to why you would get the 1 row returned followed by an error ???

demeiyan commented 6 years ago

What is the IRI of bif ? using SPARQL