mff-uk / odcs

ODCleanStore
1 stars 11 forks source link

SPARQL Loader drops dataParts #511

Closed jakubklimek closed 10 years ago

jakubklimek commented 11 years ago

SPARQL Loader drops some parts in my pipelines. Please take a look at it (it is on the odcs.xrg.cz:8080/odcleanstore instance, execution monitor, pipelines that end with warnings):

Now I ran the "PSP.CZ from Cache" in Debug mode so you can take a look at the data when it finishes in approx. one hour.

I think the data is OK and there is some problem with your encoding because the data was successfully received from the extractor by ODCS.

image

tomas-knap commented 11 years ago

Jirka, please update on this, it is urgent

tomesj commented 11 years ago

I know about it, I tried to look at it yesterday, but so far I have not found a bug that would cause the problem.

Are you sure that the data for input to the SPARQL loader are OK? Are there not any extra characters, parentheses, or something similar?

I still dont feel completely well enough for it to work consistently. I look at it this in the evening or tomorrow.

tomesj commented 11 years ago

Problem is not in method load, but in graph (triples in graph) !!!

Reasons: I tested this pipeline using execution monitor:

Pipeline name: Buyer Profiles Current Year Update Date and time run: 19.9.2013 1:00:46 Log page for name graph where are data to load: 559/594 Name of graph to load: http://linked.opendata.cz/resource/odcs/internal/pipeline/exec/28/dpu/20/du/0 Count of triples to load: 655505 (same result using query select count (*) where {?x ?y ?z} )

image

Then I used: SPARQL endpoint: http://odcs.xrg.cz:8899/sparql Graph name: http://linked.opendata.cz/resource/odcs/internal/pipeline/exec/28/dpu/20/du/0 Count of triples using query [ select count (*) where {?x ?y ?z} ]: 655505

Then used SPARQL extractor for extract this data to repository using query: construct {?x ?y ?z } where {?x ?y ?z}

But RDF repository has only (100001 triples !!!) No errors was thrown during parsing. When I then try to load it - there was no problem !!! It finished successfull without warnings (errors).

Summary: Triple count in repository after extract from graph:100001 triples Triple count in original graph: 655505 triples

Result: Some data are probably not RDF type (maybe metadata or are wrong data, ..). If you can load data to SPARQL endpoint using RDF repository- set data graph must have only RDF data.

If you set data graph to RDF repository (for using triples from it), you must be sure, that this graph has same triple count as if you extracted data from them. In other case use extractor from this graph and then load result :-)

jakubklimek commented 11 years ago

I am a bit lost here. Those numbers look like some default limits but i still dont completely get what is the problem and what you are suggesting. On Sep 19, 2013 6:34 PM, "Jiří Tomeš" notifications@github.com wrote:

Problem is not in method load, but in graph (triples in graph) !!!

Reasons: I tested this pipeline using execution monitor:

Pipeline name: Buyer Profiles Current Year Update Date and time run: 19.9.2013 1:00:46 Log page for name graph where are data to load: 559/594 Name of graph to load: http://linked.opendata.cz/resource/odcs/internal/pipeline/exec/28/dpu/20/du/0 Count of triples to load: 655505 (same result using query select count (*) where {?x ?y ?z} )

[image: image]https://f.cloud.github.com/assets/3481055/1174265/16d9f088-2145-11e3-9e36-a1a7fb407505.png

Then I used: SPARQL endpoint: http://odcs.xrg.cz:8899/sparql Graph name: http://linked.opendata.cz/resource/odcs/internal/pipeline/exec/28/dpu/20/du/0 Count of triples using query [ select count (*) where {?x ?y ?z} ]: 655505

Then used SPARQL extractor for extract this data to repository using query: construct {?x ?y ?z } where {?x ?y ?z}

But RDF repository has only (100001 triples !!!) No errors was thrown during parsing. When I then try to load it - there was no problem !!! It finished successfull without warnings (errors).

Summary: Triple count in repository after extract from graph:100001 triples Triple count in original graph: 655505 triples

Result: Some data are probably not RDF type (maybe metadata or are wrong data, ..). If you can load data to SPARQL endpoint using RDF repository- set data graph must have only RDF data.

If you set data graph to RDF repository (for using triples from it), you must be sure, that this graph has same triple count as if you extracted data from them. In other case use extractor from this graph and then load result :-)

— Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/intlib/issues/511#issuecomment-24753429 .

tomesj commented 11 years ago

It certainly is not related to any limits - the number of triples I tested for millions (in the case of extraction data from a big file) and everything was fine.

tomas-knap commented 11 years ago

Jirka, can you please explain in two sentences what the core problem is? Data in the graph are invalid, right, but is that result of the pipeline processing?

tomas-knap commented 11 years ago

Jirka, can you please also particularize which triples cause the problem when loading data to target SPARQL endpoint? Or at least, can you please supplement the error message, that certain parts failed with the list of all triples in these part?

tomesj commented 11 years ago

The problem is that for load to SPARQL endpoint repository use set data graph, but graph not containing only RDF data (or they are not all in the correct format). And if you using construct query (for all triples) you find out, that this count is more less, than the count triples in graph - some data are not intended for uploading and it cause problems.

tomesj commented 11 years ago

For finding triples cause problems, I need approach (connection) to your Virtuoso, because when I only extract triples from this graph - it will be extracted only right triples and there is no problem to load them (I have tested yet).

Unfortunately, the current connection to Virtuoso (http://odcs.xrg.cz:8899/conductor) or SPARQL endpoint (http://odcs.xrg.cz:8899/sparql) is not working

tomas-knap commented 11 years ago

Jirka,please try the link to conductor/sparql again, it is working now

There are only RDF data and they are correct at the beginning. So there is a problem on the pipeline somewhere, probably in the loader?

tomesj commented 11 years ago

Thank you, I try to use data graph and try load to mySPARQL, then I write result :-)

tomesj commented 11 years ago

Please sent me necessary data for Virtuoso connection to my email. I can not connect thanks JDBC.

tomas-knap commented 11 years ago

well, this should be port 1119 to connect over isql

On Fri, Sep 20, 2013 at 3:46 PM, Jiří Tomeš notifications@github.comwrote:

Please sent me necessary data for Virtuoso connection to my email. I can not connect thanks JDBC.

— Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/intlib/issues/511#issuecomment-24811210 .

tomesj commented 11 years ago

Yes, connection is fine now - thanks :-)

tomesj commented 11 years ago

I let write problems parts:

For example: 1491/6556 part

Query: INSERT {http://linked.opendata.cz/resource/business-entity/CZ28982347 http://purl.org/dc/terms/title """Sellier&Bellot a.s.""" . http://linked.opendata.cz/resource/business-entity/CZ65209737 http://purl.org/dc/terms/title """Pavel Svoboda""" . http://linked.opendata.cz/resource/business-entity/CZ40316050 http://purl.org/dc/terms/title """Pavel Svoboda""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.e-zakazky.cz/Profil-Zadavatele/682719e2-6c8f-40ab-a9cd-1b108478603b http://purl.org/dc/terms/title """hotel Rudka s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/uverejnovani.cz/profiles/details/ascari-s-r-o http://purl.org/dc/terms/title """Ascari""" . http://linked.opendata.cz/resource/business-entity/CZ00242772 http://purl.org/dc/terms/title """Nalžovice Obec""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/mestys-vladislav http://purl.org/dc/terms/title """Městy Vladislav""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/sluzby-hb-s-r-o http://purl.org/dc/terms/title """www.vhodneuverejneni.cz""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/tj-sk-kravare http://purl.org/dc/terms/title """www.vhodneuverejneni.cz""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.profesionalove.cz/profil-zadavatele/seznam_zadavatelu.php http://purl.org/dc/terms/title """Obec Chotutice""" . http://linked.opendata.cz/resource/business-entity/CZ00235393 http://purl.org/dc/terms/title """Obec Chotutice""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.profesionalove.cz/profil-zadavatele/seznam_zadavatelu.php http://purl.org/dc/terms/title """Obec Jabkenice""" . http://linked.opendata.cz/resource/business-entity/CZ00237949 http://purl.org/dc/terms/title """Obec Jabkenice""" . http://linked.opendata.cz/resource/business-entity/CZ63221667 http://purl.org/dc/terms/title """GENERI BIOTECH s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/generi-biotech.profilzadavatele.cz http://purl.org/dc/terms/title """GENERI BIOTECH s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ00070173 http://purl.org/dc/terms/title """Technické služby města Nových Hradů""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/ts-hrady.profilzadavatele.cz http://purl.org/dc/terms/title """Technické služby města Nových Hradů""" . http://linked.opendata.cz/resource/business-entity/CZ43842780 http://purl.org/dc/terms/title """Jan Záhorka""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/jan-zahorka.profilzadavatele.cz http://purl.org/dc/terms/title """Jan Záhorka""" . http://linked.opendata.cz/resource/business-entity/CZ25281348 http://purl.org/dc/terms/title """Novotný 97, spol. s r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.profilyzadavatelu.cz/profil/nov97spos http://purl.org/dc/terms/title """Novotný 97, spol. s r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ00234028 http://purl.org/dc/terms/title """Obec Zadní Třebaň""" . http://linked.opendata.cz/resource/business-entity/CZ26727145 http://purl.org/dc/terms/title """BOSÁK BUS, spol. s r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ00276464 http://purl.org/dc/terms/title """Obec Brněnec""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/brnenec.profilzadavatele.cz http://purl.org/dc/terms/title """Obec Brněnec""" . http://linked.opendata.cz/resource/business-entity/CZ00263265 http://purl.org/dc/terms/title """Obec Višňová""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/visnova.profilzadavatele.cz http://purl.org/dc/terms/title """Obec Višňová""" . http://linked.opendata.cz/resource/business-entity/CZ00672025 http://purl.org/dc/terms/title """Obec Jindřichovice pod Smrkem""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/jindrichovice.profilzadavatele.cz http://purl.org/dc/terms/title """Obec Jindřichovice pod Smrkem""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/profil.violette-sro.cz/Contracts.aspx/1088 http://purl.org/dc/terms/title """Obec Jindřichovice pod Smrkem""" . http://linked.opendata.cz/resource/business-entity/CZ00286311 http://purl.org/dc/terms/title """Městys Nová Říše""" . http://linked.opendata.cz/resource/business-entity/CZ00109380 http://purl.org/dc/terms/title """Zemědělské družstvo "Skalka"""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/zemedelske-druzstvo-skalka http://purl.org/dc/terms/title """Zemědělské družstvo "Skalka"""" . http://linked.opendata.cz/resource/business-entity/CZ00848441 http://purl.org/dc/terms/title """Obec Trnávka""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil http://purl.org/dc/terms/title """Obec Trnávka""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/obec-trnavka http://purl.org/dc/terms/title """Obec Trnávka""" . http://linked.opendata.cz/resource/business-entity/CZ69972061 http://purl.org/dc/terms/title """Sdružení měst a obcí Plzeňského kraje""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.egordion.cz/nabidkaGORDION/profilSMOPlzenskehokraje http://purl.org/dc/terms/title """Sdružení měst a obcí Plzeňského kraje""" . http://linked.opendata.cz/resource/business-entity/CZ61781771 http://purl.org/dc/terms/title """Vyšší odborná škola, Obchodní akademie, Střední zdravotnická škola a Jazyková škola s právem státní jazykové zkoušky, Klatovy, Plánická 196""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/ezak.cnpk.cz/profile_display_69.html http://purl.org/dc/terms/title """Vyšší odborná škola, Obchodní akademie, Střední zdravotnická škola a Jazyková škola s právem státní jazykové zkoušky, Klatovy, Plánická 196""" . http://linked.opendata.cz/resource/business-entity/CZ00599361 http://purl.org/dc/terms/title """Obec Dlouhé""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/obecdlouhe.profilzadavatele.cz http://purl.org/dc/terms/title """Obec Dlouhé""" . http://linked.opendata.cz/resource/business-entity/CZ00263664 http://purl.org/dc/terms/title """Obec Hrobce""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.stavebnionline.cz/profil/hrobce http://purl.org/dc/terms/title """Obec Hrobce""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.hrobce.cz http://purl.org/dc/terms/title """Obec Hrobce""" . http://linked.opendata.cz/resource/business-entity/CZ47723505 http://purl.org/dc/terms/title """Základní škola JIH, Mariánské Lázně, Komenského 459, příspěvková organizace""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/zakladni-skola-jih-marianske-lazne-komenskeho-459-prispevkova-organizace http://purl.org/dc/terms/title """Základní škola JIH, Mariánské Lázně, Komenského 459, příspěvková organizace""" . http://linked.opendata.cz/resource/business-entity/CZ28551656 http://purl.org/dc/terms/title """infinity - progress o.s.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/infinity-progress-o-s http://purl.org/dc/terms/title """infinity - progress o.s.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/mhvodnany.profilzadavatele.cz http://purl.org/dc/terms/title """Městské hospodářství Vodňany,spol. s r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.mhvodnany.cz http://purl.org/dc/terms/title """Městské hospodářství Vodňany,spol. s r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ25183222 http://purl.org/dc/terms/title """Městské hospodářství Vodňany, spol. s r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.softender.cz/home/profil/hracholusky http://purl.org/dc/terms/title """Obec Hracholusky - okres Rakovník""" . http://linked.opendata.cz/resource/business-entity/CZ25862731 http://purl.org/dc/terms/title """NAM system, a.s.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/nam-system-a-s http://purl.org/dc/terms/title """NAM system, a.s.""" . http://linked.opendata.cz/resource/business-entity/CZ47718579 http://purl.org/dc/terms/title """ŠKODA ELECTRIC a.s.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.e-zakazky.cz/Profil-Zadavatele/e424beea-df23-436f-864e-7e7caea3e81f http://purl.org/dc/terms/title """ŠKODA ELECTRIC a.s.""" . http://linked.opendata.cz/resource/business-entity/CZ00279633 http://purl.org/dc/terms/title """Obec Těchonín""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.e-zakazky.cz/Profil-Zadavatele/5a776ff1-af8e-49bc-8a88-e0239c96445c http://purl.org/dc/terms/title """Obec Těchonín""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.techonin.cz http://purl.org/dc/terms/title """Obec Těchonín""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.e-zakazky.cz/Profil-Zadavatele/76058c42-cdfc-4f5d-91b7-05766d517715 http://purl.org/dc/terms/title """ALMA PNEU s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ25392344 http://purl.org/dc/terms/title """ALMA PNEU s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ01147994 http://purl.org/dc/terms/title """Jitka Tomanová""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.e-zakazky.cz/Profil-Zadavatele/b31e7355-2858-49d3-9db3-10d7a9c54bdf http://purl.org/dc/terms/title """Jitka Tomanová""" . http://linked.opendata.cz/resource/business-entity/CZ64361179 http://purl.org/dc/terms/title """Stora Enso Wood Products Planá s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/stora-plana.profilzadavatele.cz http://purl.org/dc/terms/title """Stora Enso Wood Products Planá s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ25264605 http://purl.org/dc/terms/title """Stora Enso Wood Products Ždírec s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/stora-zdirec.profilzadavatele.cz http://purl.org/dc/terms/title """Stora Enso Wood Products Ždírec s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ00287644 http://purl.org/dc/terms/title """Obec Prusinovice""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/obecprusinovice.profilzadavatele.cz http://purl.org/dc/terms/title """Obec Prusinovice""" . http://linked.opendata.cz/resource/business-entity/CZ25767101 http://purl.org/dc/terms/title """FRIGERA METAL, a.s.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/frigera-metal.profilzadavatele.cz http://purl.org/dc/terms/title """FRIGERA METAL, a.s.""" . http://linked.opendata.cz/resource/business-entity/CZ63991691 http://purl.org/dc/terms/title """LOGIT, s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/logit.profilzadavatele.cz http://purl.org/dc/terms/title """LOGIT, s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ28340957 http://purl.org/dc/terms/title """Bergasto s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/bergasto.profilzadavatele.cz http://purl.org/dc/terms/title """Bergasto s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ00579149 http://purl.org/dc/terms/title """Obec Říčky v Orlických horách""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/obec-ricky-v-orlickych-horach-cp-2 http://purl.org/dc/terms/title """Obec Říčky v Orlických horách""" . http://linked.opendata.cz/resource/business-entity/CZ64506843 http://purl.org/dc/terms/title """GenAgro Říčany a.s.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.genagro.cz/profil http://purl.org/dc/terms/title """GenAgro Říčany a.s.""" . http://linked.opendata.cz/resource/business-entity/CZ28621905 http://purl.org/dc/terms/title """CRONIMET Ostrava, s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.stavebnionline.cz/profil/cronimet http://purl.org/dc/terms/title """CRONIMET Ostrava, s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ75988691 http://purl.org/dc/terms/title """Mgr.Jaroslav Hruška""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/mgr-jaroslav-hruska http://purl.org/dc/terms/title """Mgr.Jaroslav Hruška""" . http://linked.opendata.cz/resource/business-entity/CZ28621981 http://purl.org/dc/terms/title """Nadační fond Vincenze Priessnitze""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/www.stavebnionline.cz/profil/NFpriessnitz http://purl.org/dc/terms/title """Nadační fond Vincenze Priessnitze""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/c-copy-centrum-praha-s-r-o http://purl.org/dc/terms/title """C - COPY Centrum Praha, s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ25056085 http://purl.org/dc/terms/title """C - COPY Centrum Praha, s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.kdv.cz/profil.php http://purl.org/dc/terms/title """TEXTIL INVEST s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ25376977 http://purl.org/dc/terms/title """TEXTIL INVEST s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ01733214 http://purl.org/dc/terms/title """TESCAN Brno, s.r.o""" . http://linked.opendata.cz/resource/business-entity/CZ45035938 http://purl.org/dc/terms/title """Římskokatolická farnost Lukavec""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/ezak.straziste.cz/profile_display_85.html http://purl.org/dc/terms/title """Římskokatolická farnost Lukavec""" . http://linked.opendata.cz/resource/business-entity/CZ27631273 http://purl.org/dc/terms/title """MOLITAN a.s.""" . http://linked.opendata.cz/resource/business-entity/CZ74279351 http://purl.org/dc/terms/title """Jaroslav Novotný""" . http://linked.opendata.cz/resource/business-entity/CZ25281348 http://purl.org/dc/terms/title """Novotný 97, spol. s r.o. spol. s r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.profilzadavatele.cz/profil-zadavatele/Obec-Zadni-Treban_1518 http://purl.org/dc/terms/title """Zadní Třebáň - profil zadavatele""" . http://linked.opendata.cz/resource/business-entity/CZ26727145 http://purl.org/dc/terms/title """BOSÁK BUS, spol. s.r.o.""" . http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.egordion.cz/nabidkaGORDION/profilBOSAKBUS http://purl.org/dc/terms/title """BOSÁK BUS, spol. s.r.o.""" . http://linked.opendata.cz/resource/business-entity/CZ00286311 http://purl.org/dc/terms/title """Nová Říše Městys""" . }

image

But if I can add these triples throught SPARQL endpoint as insert - I always got table for authentisation. If I add data, I get this table again - see my image. Here is the problem.

tomesj commented 11 years ago

Now after many time authentisation table I get: image

Explanation (See http://docs.openlinksw.com/virtuoso/errors.html): SR041 22023 Argument 1 of locate is not a wide string

jakubklimek commented 11 years ago

Are you inserting the triples using HTTP GET method? I can see from the screenshot that the query is actually in the URL itself. If so, try using the HTTP POST method, as it is able to transfer larger data. You are not in control of how much data the, for example, 100 triples actually are and there are limits on URL size.

tomas-knap commented 11 years ago

Jirko, so the status of this bug is that you are outputting the wrong parts in the event errors/log, so that pipeline author can debug the problems? Did you find out why the strange chars are there?

tomesj commented 11 years ago

I tried add problems parts using SPARQL endpoint GUI - you right, that there are used HTTP GET and it cause problems, because URL is too long.

I try it again using my method with HTTP POST (default method for loading triples). I try find out and describe problems, why these parts dont pass.

jakubklimek commented 11 years ago

OK, in an extreme case we could try to set chunk size to 1 and see what happens - in time.

tomesj commented 11 years ago

I find out problem - if subject is literal and ends with ", it cause problems (there are 4x ")

There are some concrete problem insert parts, which dont pass:

Data 149132/655505 part INSERT {<http://linked.opendata.cz/resource/business-entity/CZ00109380&gt; <http://purl.org/dc/terms/title&gt; """Zemědělské družstvo "Skalka"""" . }

Data 149133/655505 part INSERT {<http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/www.vhodne-uverejneni.cz/profil/zemedelske-druzstvo-skalka&gt; <http://purl.org/dc/terms/title&gt; """Zemědělské družstvo "Skalka"""" . }

Data 149876/655505 part

INSERT {<http://linked.opendata.cz/resource/business-entity/CZ13378473&gt; <http://purl.org/dc/terms/title&gt; """Josef Spáčil "ELEKTRO"""" . }

Data 150179/655505 part INSERT {<http://linked.opendata.cz/resource/business-entity/CZ00144151&gt; <http://purl.org/dc/terms/title&gt; """Zemědělské družstvo "Křižanovsko"""" . }

Data 150180/655505 part INSERT {<http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/http/zd-krizanovsko.profilzadavatele.cz&gt; <http://purl.org/dc/terms/title&gt; """Zemědělské družstvo "Křižanovsko"""" . }

Data 150595/655505 INSERT {<http://linked.opendata.cz/resource/business-entity/CZ26537516&gt; <http://purl.org/dc/terms/title&gt; """"SE.S.TA"""" . }

Data 150730/655505 part INSERT {<http://linked.opendata.cz/resource/business-entity/CZ27031161&gt; <http://purl.org/dc/terms/title&gt; """"o.s.Sportem proti barierám - Český Ráj"""" . }

Data 151016/655505 part INSERT {<http://linked.opendata.cz/resource/business-entity/CZ69766207&gt; <http://purl.org/dc/terms/title&gt; """Dobrovolný svazek obcí Církvice a Nové Dvory "Klejnarka"""" . }

Data 151017/655505 part INSERT {<http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/vzpro.cz/zakazky/profil/69766207&gt; <http://purl.org/dc/terms/title&gt; """Dobrovolný svazek obcí Církvice a Nové Dvory "Klejnarka"""" . }

Data 151932/655505 part INSERT {<http://linked.opendata.cz/resource/business-entity/CZ22719245&gt; <http://purl.org/dc/terms/title&gt; """"NIKÉ"""" . }

tomesj commented 11 years ago

Possible solution - after last char " add space. Then it passed.

For example: INSERT {<http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/vzpro.cz/zakazky/profil/69766207&gt; <http://purl.org/dc/terms/title&gt; """Dobrovolný svazek obcí Církvice a Nové Dvory "Klejnarka" """ . }

jakubklimek commented 11 years ago

Would it pass if you checked whether the last char is " and changed it to \"? I must admit i am not sure about this one. Please test it. It should be testable directly through the sparql endpoint in a browser. On Sep 21, 2013 3:19 PM, "Jiří Tomeš" notifications@github.com wrote:

Possible solution - after last char " add space. Then it passed.

For example: INSERT {< http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/vzpro.cz/zakazky/profil/69766207> http://purl.org/dc/terms/title """Dobrovolný svazek obcí Církvice a Nové Dvory "Klejnarka" """ . }

— Reply to this email directly or view it on GitHubhttps://github.com/mff-uk/intlib/issues/511#issuecomment-24862300 .

tomas-knap commented 11 years ago

Jirka, but when looking at the first screenshot of this bug, there are more than four records with this problem, hm?

jakubklimek commented 11 years ago

Well... according to Turtle specification, these characters should be always escaped, even inside """ """ as this only allows newlines in addition to " ":

The character escapes are:
\t (U+0009, tab)
\n (U+000A, linefeed)
\r (U+000D, carriage return)
\" (U+0022, double quote - only allowed inside strings)
\> (U+003E, greater than - only allowed inside URIs)
\\ (U+005C, backslash)
\uHHHH or \UHHHHHHHH for writing Unicode characters by hexadecimal codepoint where H is a single hexadecimal digit.
jakubklimek commented 11 years ago

But the problem actually is in SPARQL syntax, not TURTLE. And it states following example:

'''The librarian said, "Perhaps you would enjoy 'War and Peace'."'''

So actually in SPARQL, you should use triple "single" quotes. The question is, why are these characters not escaped when you pass them as a SPARQL insert query. I think that you generate the insert query from the text of the literals, which is unescaped when you get it from openRDF used by ODCS repositories, but you don't escape it again when you create the SPARQL query when you should.

tomesj commented 11 years ago

OK, I will needed escape to all needed escaped characters via table.

It is necessary - I found next problem in triples (use unescape backslash)

INSERT { <http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/////EZAK.CNPK.CZ//PROFILE_DISPLAY_132.HTML&gt; <http://purl.org/procurement/public-contracts#originalBuyerProfileUrl&gt; """HTTPS:\EZAK.CNPK.CZ\PROFILE_DISPLAY_132.HTML""" . <http://linked.opendata.cz/resource/domain/buyer-profiles/profile/cz/https/EZAK.CNPK.CZ/PROFILE_DISPLAY_132.HTML&gt; <http://purl.org/procurement/public-contracts#originalBuyerProfileUrl&gt; """HTTPS:\EZAK.CNPK.CZ\PROFILE_DISPLAY_132.HTML""" . }

After then it could be fine :-)

tomesj commented 11 years ago

I add needed escape, I tested loading to SPARQL endpoint and it is fine now. I add flag "TEST THAT".

tomas-knap commented 11 years ago

Jirka please test the loading also against Fuseki, related to #640

I mean to test that the chosen encoding is also working in case of Fuseki, producing correct results. Try data which may cause problems, such as those with special chars, quotes, quotes at the end etc. Put all these data which may cause problems to one ttl file and upload to confluence, put link here.

tomesj commented 11 years ago

Created file with potencial problem triples on conflues page:

Direct link to file: https://grips.semantic-web.at/download/attachments/50922493/problemTriples.ttl?version=1&modificationDate=1384205335753&api=v2

tomas-knap commented 10 years ago

When you tried to load the file, was everything ok?