mmisw / mmiorr

Unmaintained old MMI ORR system (v2) -- New development at https://github.com/mmisw/orr
2 stars 1 forks source link

search for 'dataset' produces some literals as subjects #350

Closed graybeal closed 9 years ago

graybeal commented 9 years ago

The search for the term 'dataset' in all ontology content in ORR (URL: https://mmisw.org/orr/#st/dataset) produces a few odd results in ORR. There are cases where the subject is a literal, like the text between quotes below:

"http://inspire-registry.jrc.ec.europa.eu/registers/GLOSSARY/items/159

  1. Resource type: 1.1. Spatial data set series (series) 1.2. Spatial data set (dataset) 1.3. Spatial data services (services)"

(Predicate = "http://purl.org/dc/terms/description", Object="The value domain of this metadata element is defined in Part D.1.")

Haven't had a chance to check the originals in these cases to see what might be wrong, just recording this data for posterity.

carueda commented 9 years ago

Good catch. Does seem like a bug.

Using AG's WebView I just did this query (linefeeds inserted for clarity):

select ?o where {
<http://inspire-registry.jrc.ec.europa.eu/registers/GLOSSARY/items/159> 
<http://purl.org/dc/terms/description> 
?o}

and got

"\u03A3\u03C5\u03C3\u03C4\u03AE\u03BC\u03B1\u03C4\u03B1 \u03B3\u03B9\u03B1 
\u03BC\u03BF\u03BD\u03BF\u03C3\u03AE\u03BC\u03B1\u03BD\u03C4\u03B7 
\u03B1\u03BD\u03B1\u03C6\u03BF\u03C1\u03AC \u03C7\u03C9\u03C1\u03B9\u03BA\u03CE\u03BD 
\u03C0\u03BB\u03B7\u03C1\u03BF\u03C6\u03BF\u03C1\u03B9\u03CE\u03BD 
\u03C3\u03C4\u03BF\u03BD \u03C7\u03CE\u03C1\u03BF, \u03C9\u03C2 
\u03C3\u03CD\u03BD\u03BF\u03BB\u03BF 
\u03C3\u03C5\u03BD\u03C4\u03B5\u03C4\u03B1\u03B3\u03BC\u03AD\u03BD\u03C9\u03BD (x,y,z) 
\u03AE/\u03BA\u03B1\u03B9 \u03B3\u03B5\u03C9\u03B3\u03C1\u03B1\u03C6\u03B9\u03BA\u03CC 
\u03C0\u03BB\u03AC\u03C4\u03BF\u03C2 \u03BA\u03B1\u03B9 \u03BC\u03AE\u03BA\u03BF\u03C2 
\u03BA\u03B1\u03B9 \u03CD\u03C8\u03BF\u03C2, \u03BC\u03B5 \u03B2\u03AC\u03C3\u03B7 
\u03B3\u03B5\u03C9\u03B4\u03B1\u03B9\u03C4\u03B9\u03BA\u03CC 
\u03BF\u03C1\u03B9\u03B6\u03CC\u03BD\u03C4\u03B9\u03BF \u03BA\u03B1\u03B9 
\u03BA\u03B1\u03C4\u03B1\u03BA\u03CC\u03C1\u03C5\u03C6\u03BF 
\u03C3\u03CD\u03C3\u03C4\u03B7\u03BC\u03B1 
\u03B1\u03BD\u03B1\u03C6\u03BF\u03C1\u03AC\u03C2 (datum)."@el

"The value domain of this metadata element is defined in Part D.1.\n\n1. Resource type:\n1.1. Spatial 
data set series (series)\n1.2. Spatial data set (dataset)\n1.3. Spatial data services (services)"@en

Somehow this is causing the parsing of the response to misbehave on the ORR side ... need to investigate.

carueda commented 9 years ago

Here's a screenshot of the above query in WebView

image

carueda commented 9 years ago

And here is the result in json format:

{
  "head" : {
    "vars" : ["o"]
  },
  "results" : {
    "bindings" : [ 
      {
         "o":{"type":"literal","xml:lang":"el", "value":"Συστήματα για μονοσήμαντη αναφορά χωρικών πληροφοριών στον χώρο, ως σύνολο συντεταγμένων (x,y,z) ή/και γεωγραφικό πλάτος και μήκος και ύψος, με βάση γεωδαιτικό οριζόντιο και κατακόρυφο σύστημα αναφοράς (datum)."}
      },
      {
         "o":{"type":"literal","xml:lang":"en", "value":"The value domain of this metadata element is defined in Part D.1.\n\n1. Resource type:\n1.1. Spatial data set series (series)\n1.2. Spatial data set (dataset)\n1.3. Spatial data services (services)"}
      }
    ]
  }
}

So, seems like the 2-valued result (an object per language) triggers the misbehavior. Probably a regression caused by some upgrade of AG as I think the query result was previously generated with separate triples, not by aggregating the multiple values in a single response for the same subject-predicate pair. Anyway, this is just a very preliminary analysis.

carueda commented 9 years ago

BTW, YASGUI (which I'm planning on using for a next version of the query UI) seems to behave better in terms of decoding the results for this same query:

image

carueda commented 9 years ago

Fixed.

The "search terms" logic now performs a correct parsing of the returned result from the triplestore. In particular, the display of queries like https://mmisw.org/orr/#st/dataset are now properly formatted to reflect the received triples.