openlink / virtuoso-opensource

Virtuoso is a high-performance and scalable Multi-Model RDBMS, Data Integration Middleware, Linked Data Deployment, and HTTP Application Server Platform
https://vos.openlinksw.com
Other
867 stars 210 forks source link

importing broken URIs in NT won't complain and SPARQL endpoint will happily serve them #502

Open joernhees opened 8 years ago

joernhees commented 8 years ago

if you load an invalid NT file containing spaces in URIs with rdf_loader_run the loader won't complain and the SPARQL endpoint will happily return invalid URIs.

For example try loading a file test.nt that looks like this:

<http://test 123> <http://testp> "great" .

no error will be logged:

select ll_error from DB.DBA.LOAD_LIST where ll_error is not NULL;

afterwards run a SPARQL Query like

select ?u where { ?u <http://testp> "great". }

to get a <http://test 123> URIRef...

HughWilliams commented 8 years ago

@joernhees: What Virtuoso release/build are you using as when I attempt to load with a latest git develop/7 build I do get an error:

$ cat test.ttl
<http://test 123> <http://testp> "great"
$

SQL> delete from load_list;

Done. -- 1 msec.
SQL> ld_dir ('.', 'test.ttl', 'http://testspaceuri');

Done. -- 1 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

./test.ttl                                                                        http://testspaceuri                                                               0           NULL                 NULL                 NULL        NULL        NULL

1 Rows. -- 0 msec.
SQL> trace_on();

Done. -- 0 msec.
SQL> rdf_loader_run();

Done. -- 4 msec.
SQL> trace_off();

Done. -- 1 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

./test.ttl                                                                        http://testspaceuri                                                               2           2015.11.27 21:32.45 20273000  2015.11.27 21:32.45 22789000  0           NULL        37000 SP029: TURTLE RDF loader, line 2: syntax error processed pending to here.

1 Rows. -- 1 msec.
SQL> status('');
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3215-pthreads for Darwin as of Nov 15 2015 
Started on: 2015-11-27 21:23 GMT+0
joernhees commented 8 years ago

sorry for not mentioning this... that was on 7.2.1 as released here...

Version 07.20.3214-pthreads for Linux as of Nov 11 2015

joernhees commented 8 years ago

ah, is it possible you missed the "." to terminate the triple and it's just complaining about that?

HughWilliams commented 8 years ago

Ah, indeed that was it, adding the missing "." the bulk loader is loading the triple:

SQL> ld_dir ('.', 'test.ttl', 'http://testspaceuri');

Done. -- 8 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

./test.ttl                                                                        http://testspaceuri                                                               0           NULL                 NULL                 NULL        NULL        NULL

1 Rows. -- 0 msec.
SQL> rdf_loader_run();

Done. -- 6 msec.
SQL> select * from load_list;
ll_file                                                                           ll_graph                                                                          ll_state    ll_started           ll_done              ll_host     ll_work_time  ll_error
VARCHAR NOT NULL                                                                  VARCHAR                                                                           INTEGER     TIMESTAMP            TIMESTAMP            INTEGER     INTEGER     VARCHAR
_______________________________________________________________________________

./test.ttl                                                                        http://testspaceuri                                                               2           2015.11.28 19:44.21 14077000  2015.11.28 19:44.21 16778000  0           NULL        NULL

1 Rows. -- 1 msec.
SQL> sparql select * from <http://testspaceuri> where {?s ?p ?o};
s                                                                                 p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://test 123                                                                   http://testp                                                                      great

1 Rows. -- 1 msec.
SQL>

I have reported to development to look into ...

joernhees commented 8 years ago

:+1: thanks