Closed KonradHoeffner closed 5 years ago
property | count |
---|---|
http://www.snik.eu/ontology/he/page | 2131 |
http://www.snik.eu/ontology/bb/page | 1146 |
http://www.snik.eu/ontology/it4it/page | 18 |
http://www.snik.eu/ontology/ob/page | 774 |
select ?p as ?relation ?c as ?class count(distinct * ) as ?count
{
?x ?p ?y.
?x a ?c.
filter(REGEX(STR(?p),"page"))
} group by ?p ?c
In the old bb.rdf from the repository (last update: 2017-08-01), there are only 494 occurrences of page, see grep "page>[^<]" bb.rdf | wc -l
. This is because the SPARQL query also counts empty page statements, which should be removed. In the meantime, here is the count for nonempty pages:
select ?p as ?relation ?c as ?class count(distinct * ) as ?count
{
?x ?p ?y.
?x a ?c.
filter(?y!="").
filter(REGEX(STR(?p),"page"))
} group by ?p ?c
And for comparison the empty ones:
There don't seem to be any pages for triples, just for classes and relations. Maybe they are mistakenly placed in the relations? How many are there at maximum for a single subject?
select ?x count(?y)
{
?x bb:page ?y.
filter(?y!="").
} order by desc(count(?y))
Result: At most two per subject, which looks fine.
Next Step: Look at the old extraction table of bb to find out if there are page statements that are missing now.
They were there all along in the spreadsheet, all dumps in the repository and on the SPARQL endpoint. It just wasn't found because the property name is TripelPage.
select ?p as ?relation ?c as ?class count(distinct * ) as ?count
{
?x ?p ?y.
?x a ?c.
filter(?y!="").
filter(REGEX(STR(?p),"page"),"i")
} group by ?p ?c
Are the axiom still bound to valid subjects and objects?
Get an overview via:
select *
from sniko:bb
from sniko:ob
{
?x bb:TripelPage|ob:TripelPage ?page.
?x ?p ?o.
}
...
Hard to see, rewrite blank nodes to so that they are in the namespace of their subontology and can be viewed via LodView.
sparql
select ?g count(*)
{
graph ?g
{
?x ?p ?o.
FILTER(REGEX(STR(?x),"nodeID://"))
}
} group by ?g
select ?g count(*)
{
graph ?g
{
?x ?p ?o.
FILTER(REGEX(STR(?o),"nodeID://"))
}
} group by ?g
SPARQL
with <http://www.snik.eu/ontology/bb>
delete
{
?x ?p ?o.
}
insert
{
?y ?p ?o.
}
where
{
?x ?p ?o.
FILTER(REGEX(STR(?x),"nodeID://"))
BIND(IRI(REPLACE(STR(?x),"nodeID://b","http://www.snik.eu/ontology/bb/blank")) as ?y).
}
Analogously ob
Modify <http://www.snik.eu/ontology/ob>, delete 24368 (or less) and insert 24368 (or less) triples -- done
Modify <http://www.snik.eu/ontology/ciox>, delete 200 (or less) and insert 200 (or less) triples -- done
SPARQL
with <http://www.snik.eu/ontology/bb>
delete
{
?x ?p ?o.
}
insert
{
?x ?p ?y.
}
where
{
?x ?p ?o.
FILTER(REGEX(STR(?o),"nodeID://"))
BIND(IRI(REPLACE(STR(?o),"nodeID://b","http://www.snik.eu/ontology/bb/blank")) as ?y).
}
Modify <http://www.snik.eu/ontology/bb>, delete 1139 (or less) and insert 1139 (or less) triples -- done
Modify <http://www.snik.eu/ontology/ob>, delete 2338 (or less) and insert 2338 (or less) triples -- done
select *
{
?x owl:annotatedSource|owl:annotatedTarget ?o.
MINUS {?o a ?something.}
}
Result: 16 missing, so it seems to work in general. Separate issue: #297
We only seem to have sparse page and chapter data. Investigate how many we have and in which form and if we lost any through the remodels.