lindas-uc / lindas-issues

Share our issues and questions about http://lindas-data.ch/
0 stars 0 forks source link

Lindas replies with NaN instead of 0 for the same Query in about 50% of the cases !!! #5

Open l00mi opened 8 years ago

l00mi commented 8 years ago

Please run the query below multiple times to see the effect with the NaN values. This is a incorrect answer from the Store which is extremely bad.

Example query: curl 'http://test.lindas-data.ch/sparql' -H 'Origin: http://yasgui.org' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/45.0.2454.101 Chrome/45.0.2454.101 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' -H 'Accept: application/sparql-results+json' -H 'Referer: http://yasgui.org/' -H 'Connection: keep-alive' --data $'query=PREFIX+rdf%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX+u28%3A+%3Chttp%3A%2F%2Fenvironment.data.admin.ch%2Fubd%2F28%2F%3E%0ASELECT+*+WHERE%0A%7B%0A%7B%0ASELECT+%3Fstation+%3Fpollutant+%3Faggregation+%3Fyear++%3Fmeasurement+%3Funit+WHERE+%7B%0A++++%3Fm+a+%3Chttp%3A%2F%2Fexample.org%2FMeasurement%3E+.%0A++++%3Fm+%3Chttp%3A%2F%2Fexample.org%2Fmeasurement%3E+%3Fmeasurement.%0A++++%0A++++%3Fm+%3Chttp%3A%2F%2Fexample.org%2Faggregation%3E+%3Fa.%0A++++%3Fa++rdfs%3Alabel+%3Faggregation.%0A++++FILTER+(lang(%3Faggregation)+%3D+\'de\')%0A%0A++++%3Fm+%3Chttp%3A%2F%2Fexample.org%2Fstation%3E+%3Fs.%0A++++%3Fs+rdfs%3Alabel+%3Fstation.%0A%0A++++%3Fm+%3Chttp%3A%2F%2Fexample.org%2Funit%3E+%3Fu.%0A++++%3Fu+rdfs%3Alabel+%3Funit.%0A++++FILTER+(lang(%3Funit)+%3D+\'de\')%0A%0A++++%3Fm+%3Chttp%3A%2F%2Fexample.org%2Fpollutant%3E+%3Fp.%0A++++%3Fp+rdfs%3Alabel+%3Fpollutant.%0A++++FILTER+(lang(%3Fpollutant)+%3D+\'de\')%0A%0A++++%3Fm+%3Chttp%3A%2F%2Fexample.org%2Fyear%3E+%3Fy%0A++++BIND(SUBSTR(xsd%3Astring(%3Fy)%2C46)+AS+%3Fyear)%0A%7DORDER+BY+ASC(%3Fmeasurement)%0A%7D%0A%7D%0ALIMIT+1000%0A' --compressed

l00mi commented 8 years ago

Any update on this? This error does actively hinder application development.

retog commented 8 years ago

I understand that. I had a talk with the company running lindas, unfortunatey they don't look into the issues here, so they are manually transferred to their JIRA.

If it's urgent I suggest you follow the contact information provided on the lindas site: http://lindas-data.ch/#/contact

The issue looks similar to: https://github.com/openlink/virtuoso-opensource/issues/354 - maybe an update of virtuoso solves the problem, or maybe another triple store has to be used till this is fixed. Anyway the official way to ask for a working SPARQL endpoint is via the information provided on the site.

ktk commented 8 years ago

It might also be a good idea to use the production server instead, will upload the dataset there as well.

We seem to have 07.10.3211 installed on both servers.

martin-voigt commented 8 years ago

Virtuoso updated for the test system.

Virtuoso Open Source Edition (Column Store) (multi threaded) Version 7.2.3-dev.3216-pthreads as of Feb 25 2016 Compiled for Linux (x86_64-unknown-linux-gnu)

Please, double-check the queries again. The example query from above returns results without NAN on test.lindas-data.ch now.

martin-voigt commented 8 years ago

The upgrade is also done for the productive system now.

ktk commented 8 years ago

Unfortunately still happens screen shot 2016-02-26 at 10 10 57

Confirmed by BAFU as well

martin-voigt commented 8 years ago

Can you provide an example query. The one from above does'nt show me the problem. Which system?

ktk commented 8 years ago

I always trigger it via the sparql-table-viewer:

http://cpvrlab.github.io/sparql-table-viewer/

Before I filtered to Biel-Bienne as station code and I immediately got it. Now I can't get it so it's a bit tricky to reproduce it seems.

ktk commented 8 years ago

Ah you can see the SPARQL query either above or in the developer console.

retog commented 8 years ago

@martin-voigt a couple of minutes I tried the query (i.e. executing the curl command as described in @l00mi's original description around 10 times and the value was always correctly "0.0" so I already started to think that the issue is fixed. Now I tried again ant it was always NAN. So reproducing the error should take into account that it has quite some volatility.

martin-voigt commented 8 years ago

I tried

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX u28: <http://environment.data.admin.ch/ubd/28/>
SELECT * WHERE
{
{
SELECT ?stationValue ?measurementValue ?yearValue ?aggregationValue ?unitValue ?pollutantValue  WHERE {
?m <http://example.org/station> ?station.
OPTIONAL { 
  ?station rdfs:label ?stationUserLang.
  FILTER(LANGMATCHES(lang(?stationUserLang), 'en') || lang(?stationUserLang) = '')
}
OPTIONAL { 
  ?station rdfs:label ?stationDefaultLang.
  FILTER(LANGMATCHES(lang(?stationDefaultLang), 'de') || lang(?stationDefaultLang) = '')
}
BIND(COALESCE(?stationUserLang, ?stationDefaultLang) AS ?stationValue)

?m <http://example.org/measurement> ?measurementValue.

?m <http://example.org/year> ?year.
OPTIONAL { 
  ?year rdfs:label ?yearUserLang.
  FILTER(LANGMATCHES(lang(?yearUserLang), 'en') || lang(?yearUserLang) = '')
}
OPTIONAL { 
  ?year rdfs:label ?yearDefaultLang.
  FILTER(LANGMATCHES(lang(?yearDefaultLang), 'de') || lang(?yearDefaultLang) = '')
}
BIND(COALESCE(?yearUserLang, ?yearDefaultLang) AS ?yearValue)

?m <http://example.org/aggregation> ?aggregation.
OPTIONAL { 
  ?aggregation rdfs:label ?aggregationUserLang.
  FILTER(LANGMATCHES(lang(?aggregationUserLang), 'en') || lang(?aggregationUserLang) = '')
}
OPTIONAL { 
  ?aggregation rdfs:label ?aggregationDefaultLang.
  FILTER(LANGMATCHES(lang(?aggregationDefaultLang), 'de') || lang(?aggregationDefaultLang) = '')
}
BIND(COALESCE(?aggregationUserLang, ?aggregationDefaultLang) AS ?aggregationValue)

?m <http://example.org/unit> ?unit.
OPTIONAL { 
  ?unit rdfs:label ?unitUserLang.
  FILTER(LANGMATCHES(lang(?unitUserLang), 'en') || lang(?unitUserLang) = '')
}
OPTIONAL { 
  ?unit rdfs:label ?unitDefaultLang.
  FILTER(LANGMATCHES(lang(?unitDefaultLang), 'de') || lang(?unitDefaultLang) = '')
}
BIND(COALESCE(?unitUserLang, ?unitDefaultLang) AS ?unitValue)

?m <http://example.org/pollutant> ?pollutant.
OPTIONAL { 
  ?pollutant rdfs:label ?pollutantUserLang.
  FILTER(LANGMATCHES(lang(?pollutantUserLang), 'en') || lang(?pollutantUserLang) = '')
}
OPTIONAL { 
  ?pollutant rdfs:label ?pollutantDefaultLang.
  FILTER(LANGMATCHES(lang(?pollutantDefaultLang), 'de') || lang(?pollutantDefaultLang) = '')
}
BIND(COALESCE(?pollutantUserLang, ?pollutantDefaultLang) AS ?pollutantValue)
}}}

and I think the problem can be reduced to the cases that the response format is application/json. It works for me for CSV, Turtle, N-Triples and RDF/XML. Maybe it is possible use an other format? In the meantime I will contact OpenLink.

martin-voigt commented 8 years ago

@l00mi have you tried other response formats?

l00mi commented 8 years ago

@martin-voigt We have not yet tried other response formats. Mainly because for the current use case it would need a major rewrite of the application. (Also is this Proof-of-Concept implementation finished and is used to present the Lindas and LD capabilities.)

Today I could not reproduce the error, but I have seen it happening after the update of Virtuoso. I might be connected with heavy load?

Do you plan to introduce any kind of caching (such as varnish or similar?) the same query, while no change in the store, should be cached. (This would speed-up significantly stuff like the Filters in http://cpvrlab.github.io/sparql-table-viewer/)