sublima / sublima

SUBject tool for LIbraries, Museums and Archives
Other
5 stars 1 forks source link

Broken UTF-8 characters in indexed literals #1

Open ewinge opened 8 years ago

ewinge commented 8 years ago

Sublima indexes all string literals in the predicate sub:literals. The indexing breaks Unicode characters:

De nordiske Juristmøder. Litteratur. Nordic countries. Literature. Nordic. Københavns Universitet. Norden. Rettskilder generelt. De nordiske juristmøter Artikler og debatter fra de nordiske juristmøtene siden 1948. Juraportal.dk driftet av det juridiske fakultetsbibliotek i København.. The Nordic countries. 

http://juridisk.net:8890/sparql?default-graph-uri=&query=Select+%3Fp+%3Fo%0D%0AWhere{%0D%0A%3Chttp%3A%2F%2Fjura.ku.dk%2Fnjm%2F%3E+%3Fp+%3Fo%0D%0A}&format=text%2Fhtml&timeout=0&debug=on

sparql query:

select ?p ?o
where {
    <http://jura.ku.dk/njm/> ?p ?o
}
ewinge commented 8 years ago

Possibly caused by this issue in virtuoso: https://github.com/openlink/virtuoso-opensource/issues/17

ewinge commented 8 years ago

Examples:

  1. Broken RDF/XML
  2. Compare HTML
  3. Describe query works

In the last example, literals and externalliterals are broken, because of this bug. However, the title is correct.