mff-uk / odcs

ODCleanStore
1 stars 11 forks source link

SPARQL Transformer working strange #888

Closed jakubklimek closed 10 years ago

jakubklimek commented 10 years ago

OK, this is a bit tricky to explain. Look at the gov.cz pipeline on ODCS. The extractor creates 2 dataunits DU1 - from list of organizations, DU2 - from details of organizations. They are XMLs. Each DU is transformed by a different XSLT. Then, the output goes to Geocoders, which for each s:PostalAddress create s:GeoCoordinates, which is connected to s:PostalAddress using s:geo, which is not correct because s:geo goes from s:Place, not s:PostalAddress. On the other hand, for the Geocoder to be universal, it needs to process only s:PostalAddress and connect the coordinates somehow. OK, so this is what then the SPARQL transformer should correct. On the input, it has the geocoder output and non-geocoded data, which should be merged (it points to the same input). Then, the query (the query is OK) in the SPARQL transformer should reconnect the s:geo to whatever is connected via s:address to the PostalAddress (and I assume it is s:Place, but don't use it in the query).

The issue is, that this does not work. What is even weirder is that it actually deletes the original s:geo properties, but does not insert the new ones. In addition, when I orginally suspected this issue, the "list" branch transformer did nothing - did not change the data, but this didn't happen the second time.

jakubklimek commented 10 years ago

Ok, see the SPARQL TRANSFORMER TEST pipeline. It takes the source data where there are s:geo properties from s:PostalAddress to s:GeoCoordinates. In the target graph, the properties are missing (deleted and not inserted) which should not be possible because SPARQL DELETE/INSERT query is supposed to be atomic according to the specification.

Source: http://internal.opendata.cz:8890/sparql?default-graph-uri=http%3A%2F%2Flinked.opendata.cz%2Fresource%2Fdataset%2Fseznam.gov.cz%2Fovm%2Flist%2Fnotransform&query=prefix+s%3A+%3Chttp%3A%2F%2Fschema.org%2F%3E%0D%0Aselect+*+where+%7B%3Fs+s%3Ageo+%3Fo%7D+limit+100&format=text%2Fhtml&timeout=0&debug=on Target: http://internal.opendata.cz:8891/sparql?default-graph-uri=http%3A%2F%2Flinked.opendata.cz%2Fresource%2Fdataset%2Fseznam.gov.cz%2Flist%2Fsparql-transformer-test&query=prefix+s%3A+%3Chttp%3A%2F%2Fschema.org%2F%3E%0D%0Aselect+*+where+%7B%3Fs+s%3Ageo+%3Fo%7D+limit+100&format=text%2Fhtml&timeout=0&debug=on

tomas-knap commented 10 years ago

Jirko, I suspect it is related with #890

tomas-knap commented 10 years ago

Jirko, vyres #890, pak si u sebe udelej pipeline SPARQL TRANSFORMER TEST a over ze je to ok. Pridej na to test.

tomesj commented 10 years ago

Pipelina SPARQL TRANSFORMER TEST u mě seběhla pro Virtuoso i pro Local bez problémů a obsahovala příslušná data.

Byla taktéž přidána třída AddQueryGraphsTest testující přidání grafů v případě SPARQL transformer.

Tím by tedy mělo být vše splněno :-)

tomas-knap commented 10 years ago

Jirko a v cem byl tedy problem? Dle commitu to vypada, ze jsi nic nemenil.

tomesj commented 10 years ago

Problém byl jen, že pro lokální repozitář se graf dříve nepřidával, protože to nebylo potřeba. Chybu to vyřešilo