rdfhdt / hdt-cpp

HDT C++ Library and Tools
117 stars 65 forks source link

rdf2hdt fails to handle "<>" from input RDF/Turtle #281

Open donpellegrino opened 6 months ago

donpellegrino commented 6 months ago

The W3C SPARQL 1.0 Test Suite includes an test input in RDF Turtle that makes use of "<>" on line 11 of https://github.com/w3c/rdf-tests/blob/main/sparql/sparql10/i18n/normalization-01.ttl.

The rdf2hdt tool fails to handle the "<>" syntax.

user@computer:~/test/hdt-cpp-1.3.3/libhdt/tools$ ./rdf2hdt -f ttl -p -v ~/test/rdf-tests/sparql/sparql10/i18n/normalization-01.ttl normalization-01.hdt
Detected RDF input format: ttl
Catch exception load: ERROR: Could not convert triple to IDS!
 http://www.w3.org/2000/01/rdf-schema#comment "Normalized and non-normalized IRIs"
0 1 9
ERROR: ERROR: Could not convert triple to IDS!
 http://www.w3.org/2000/01/rdf-schema#comment "Normalized and non-normalized IRIs"
0 1 9

Note that the hdt-java implementation does process the file and uses a file:// URI for the subject identified with "<>" in the Turtle.

user@computer:~/test/hdt-cpp-1.3.3/libhdt/tools$ rdf2hdt.sh ~/test/rdf-tests/sparql/sparql10/i18n/normalization-01.ttl normalization-01.hdt
[WARN] base uri not specified, using 'file:///home/user/test/rdf-tests/sparql/sparql10/i18n/normalization-01.ttl'
[INFO] Converting /home/user/test/rdf-tests/sparql/sparql10/i18n/normalization-01.ttl to normalization-01.hdt as TURTLE
File converted in ..... 449 ms 523 us
Total Triples ......... 9
Different subjects .... 4
Different predicates .. 5
Different objects ..... 9
Common Subject/Object . 0
HDT saved to file in .. 2 ms 11 us
user@dunx4:~/test/hdt-cpp-1.3.3/libhdt/tools$ ./hdtSearch normalization-01.hdt
Predicate Bitmap in 40 usp: 0 % / 14.86 %
Count predicates in 5 usferences: 0 % / 16.075 %
Count Objects in 5 us Max was: 1: 0 % / 34.3 %
Bitmap in 4 usx bitmap: 0 % / 45.64 %
Bitmap bits: 9 Ones: 9
Object references in 9 usnces: 0 % / 48.475 %
Sort lists in 8 usblists: 0 % / 68.32 %
Index generated in 110 us
>> ? ? ?                                          %
_:@0 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Alice's normalized resumé"
_:@0 http://xmlns.com/foaf/0.1/name "Alice"
_:@1 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Bob's non-normalized resumé"
_:@1 http://xmlns.com/foaf/0.1/name "Bob"
_:@2 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Eve's non-normalized resumé"
_:@2 http://www.w3.org/2001/sw/DataAccess/tests/data/i18n/normalization.ttl#resumé "Eve's normalized resumé"
_:@2 http://xmlns.com/foaf/0.1/name "Eve"
file:///home/user/test/rdf-tests/sparql/sparql10/i18n/normalization-01.ttl http://www.w3.org/2000/01/rdf-schema#comment "Normalized and non-normalized IRIs"
file:///home/user/test/rdf-tests/sparql/sparql10/i18n/normalization-01.ttl http://www.w3.org/2002/07/owl#versionInfo "$Id: normalization-01.ttl,v 1.1 2005/10/25 09:38:08 aseaborne Exp $"
9 results in 172 us