Open mmaltsev opened 6 years ago
To meet this requirement, there should be something to compare to. The ontology could be one and maybe instances could be compared if they are correct instantiations of a given class in the ontology. In case that this occurs, these classes should be removed from the full chain.
Still, in the end, we need to truth
to compare with.
The only solution that came to my mind was to narrow down the classes for each standard. That is - to exclude all super classes and leave only those which are at the bottom level of the "DBpedia class tree". Such an approach was implemented here.
Applying it to the OPC_UA leads to the following. before:
sto:IEC_62541 a dbpcy:Abstraction100002137,
dbpcy:Communication100033020,
dbpcy:Direction106786629,
dbpcy:Measure100033615,
dbpcy:Message106598915,
dbpcy:Protocol106665108,
dbpcy:Rule106652242,
dbpcy:Standard107260623,
dbpcy:SystemOfMeasurement113577171,
dbpcy:WikicatComputerStandards,
dbpcy:WikicatNetworkProtocols,
after:
sto:IEC_62541 a dbpcy:WikicatComputerStandards,
dbpcy:WikicatNetworkProtocols,
Applying it to the enriched ontology yields into this. Such a process removes 429 triples overall. In addition, some of the class chains, like WikicatBusinessModels -> ... -> PhysicalEntity100001930
were totally excluded.
The problem, in this case, is that we may be removing facts that are true. E.g., OPC UA can be considered as a dbpcy:Communication100033020,
and dbpcy:Standard107260623
. To make this right we need to have a Gold Standard or at least to be able to access the ontology.
Which criteria did you use to remove the triples? How this can be validated?
The reason why I excluded such superclasses as dbpcy:Communication100033020
and dbpcy:Standard107260623
was:
1) they don't provide any additional information because dbpcy:WikicatStandards
or dbpcy:WikicatANSIStandards
or any other "bottom-level" class is automaticaly a dbpcy:Standard107260623
.
2) dbpcy:Standard107260623
itself is just some inner uuid inside DBpedia which doesn't even always mean that it is "Standard" as we understand it. Moreover, this kind of information doesn't provide us any useful knowledge - we can't really use it.
Some of the classes were removed because their "top-level" super class was PhysicalEntity100001930
which generally describes people, events, etc.
This solution might be not the best because it excludes some of the classes which are true, but at least it narrows down to those classes which are easy to check and to unerstand where they come from.
Can you evaluate what would be the precision only of this example, with and without removing? - Check this
For the sto:IEC_62541
in terms of precision, considering that from a human perspective, standard is not a Communication (dbpcy:Communication100033020)
, Direction (dbpcy:Direction106786629)
, or Message (dbpcy:Message106598915)
, then the precision before the cleaning would be p = 8/11
and after p = 2/2 = 1
. It'll look like that, again, only after human interpretation.
From the perspective of DBpedia, as a system, all of these classes, i.e. Communication100033020
or Message106598915
just represent different layers of abstraction hierarchy for the DBpedia resource. Thus, in this case either way (before and after the cleaning), p = 1
.
In the DBpedia, in our area of interest, range of the property
rdf:type
sometimes consists of irrelevant data.Example can be
sto:BBF_TR-069 -- rdf:type -- dbpcy:Rule106652242
. In this case, the unrelated objectdbpcy:Rule106652242
is just a result of implementing the predicaterdfs:subClassOf
todbpcy:Protocol106665108
.Thus, we have the full chain
dbpcy:Protocol106665108 < dbpcy:Rule106652242 < dbpcy:Direction106786629 < dbpcy:Message106598915 < dbpcy:Communication100033020 < dbpcy:Abstraction100002137
in the list of ranges forrdf:type
of thesto:BBF_TR-069
.The question is - should anything from such chains be removed from the enriched ontology?
Another example is
sto:SCOR -- rdf:type -- dbpcy:Person100007846
.This case is easier because such a concept is simply wrong and we can exclude the whole chain with
dbpcy:Person100007846
in it from the enriched ontology.