ontologyportal / sumo

Suggested Upper Merged Ontology (SUMO)
226 stars 71 forks source link

duplications #159

Open arademaker opened 5 years ago

arademaker commented 5 years ago
MilitaryProcesses.kif
...
2692 (termFormat EnglishLanguage ReconnaissanceInForce "reconnaissance in force")

domainEnglishFormat.kif
8587 (termFormat EnglishLanguage ReconnaissanceInForce "reconnaissance in force")

There are many other cases (~ 200) of duplicated statements in SUMO.

arademaker commented 5 years ago

One more case:

mondial.kif
43298(instance Rwanda Nation)

CountriesAndRegions.kif
94   (instance Rwanda Nation)
apease commented 5 years ago

Hi Alexandre, Thanks for these! They're harmless from an instance standpoint but important to remove to prevent confusion and editing of the same thing in different locations. The Sigma Diagnostics report these but it's been too far down my priority list to work on. I'd welcome a pull request on the fixed files!

all the best, Adam

arademaker commented 5 years ago

@hmuniz can you take care of that? In some cases it may be hard to decide what copy to keep, you can report in this issue those problems for getting @apease help.

apease commented 5 years ago

TermFormat and format expressions belong in domainenglishformat.kif if there’s a duplication with that file. If you can run the Sigma diagnostics you can also see the dependencies between files and ideally should keep statements in files such that there aren’t “downward” dependencies (eg merge.kif depending on any other file) or dependencies between two files I’m very grateful for any help!

All the best Adam

arademaker commented 5 years ago

@apease, what about the two axioms below? Which one we should remove?

Transportation.kif
2422 (subclass Canoe WaterVehicle)

Sports.kif
169  (subclass Canoe WaterVehicle)
apease commented 5 years ago

Hi Alexandre, Since these are both domain ontologies we can't choose immediately just based on which one is more "upper level". We should pick one of the two to contain the documentation and all its subclass statements.

Diagnostics says:

File /home/apease/.sigmakee/KBs/Sports.kif dependency size on file 
/home/apease/.sigmakee/KBs/Transportation.kif is 1 with terms:
Cycling
...
File /home/apease/.sigmakee/KBs/Transportation.kif dependency size on 
file /home/apease/.sigmakee/KBs/Sports.kif is 1 with terms:
Cycling

So either one is fine. I guess Canoes can be used for more than just sport so I'd put all the definitions in Transportation.kif . But this uncovers an additional problem with Cycling that it's being used as both a Transportation and a Sport when those are not the same activity. We need CyclingSport which is a kind of Cycling that is a Racing.

I'll make these changes.

many thanks!

apease commented 5 years ago

fixed issues with Rwanda and ReconniassanceInForce

apease commented 5 years ago

I'll close this for now but would welcome a more comprehensive review from @hmuniz

arademaker commented 5 years ago

commits 2af3accd and 371e622a related to this issue

arademaker commented 5 years ago

Let us reopen it since we still have duplicates.

arademaker commented 5 years ago

@apease can you copy here the command that you use to execute the diagnostics?

apease commented 5 years ago

at the moment you can only do this through the "Diagnostics" link in Sigma. It appears on the KBs page for each knowledge base you have loaded. I could add a command line interface to com.articulate.sigma.Diagnostics though at some point, just running each of the diagnostic routines in that class (or someone else could add that - hint hint... :-)

arademaker commented 5 years ago

I can execute $ java -Xmx7g -classpath $SIGMA_CP com.articulate.sigma.Diagnostics

apease commented 5 years ago

yes but all you'll get is the contains of main() - System.out.println(termsNotBelowEntity(kb));

and there are many other diagnostics