ncbo / bioportal-project

Serves to consolidate (in Zenhub) all public issues in BioPortal
BSD 2-Clause "Simplified" License
7 stars 5 forks source link

FMA 5.0.0 CSV creates a parent without a valid FMA id #119

Open graybeal opened 5 years ago

graybeal commented 5 years ago

User on the support list reports:

in your FMA 5.0.0 CSV file the following entry is wrong:

http://purl.org/sig/ont/fma/fma85802,FMA attribute entity,,,false,,,http://www.w3.org/2002/07/owl#Thing,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,85802,,,,,,,,,,,,,,,,,,,,,,,,fma:fma85802,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

since the parent "http://www.w3.org/2002/07/owl#Thing" does not contain a valid FMA id. I guess some other errors are in other anatomy datasets too, since my automatic parsing threw some errors. I will just exclude this anatomy from my parsing and hope future releases will have the errors fixed.

and follows up in response to a question:

just that owl#Thing has no ID as (FMAxxxxxx), I parse the FMA CSV with a bash script and remove reduntant information, meaning everything except the pure integer number, since the rest is not needed for a unique identifacion by an ID on a database. "http://www.w3.org/2002/07/owl#Thing" has no FMA ID, that is why I reported the issue.

graybeal commented 5 years ago

Samson responded:

By definition, every top-level entity in an OWL ontology has owl#Thing as a parent. It’s not a design decision on the part of FMA developers.

It’s a design decision on the part of Bioportal CSV-output developers whether to generate an entry for this definitional parent-child relationship. Given that decision, users of the CSV output have to live with the consequences either way.

I think it makes sense for our generated CSV outputs to match our generated OWL files. That approach will be more likely to meet user expectations, than if we try to have special processing for any OWL files that were generated from some other semantic source.

I will close this ticket unless further concerns are raised.

samsontu commented 5 years ago

OWL files (at least the canonical RDF/XML serialization I checked) do not have (topclass subclassOf OWL:thing) axiom. owl:thing doesn't appear in the file at all. In that sense the CVS-output depart from RDF/XML serialization of OWL files. The reason that I think it's cleaner not to output such language-dependent artifacts is that an ontology can be represented in different formalisms and a CVS-output should reflect the content of the ontology and not dependent on the formalism. Of course if you are not the developer of the software that generates the CVS-output, you are stuck with the design decision.

graybeal commented 5 years ago

Yes, I think one question is what we are considering the 'source ontology' we want to represent, and the other question is whether the CVS in general should include representing the (topclass subclassOf OWL:thing) formalism. There is not an "original ontology" in the case of UMLS, as I understand it—there are the original local models of terminologies that we're converting to OWL to have an interoperable basis of interoperation.

Hard to say how central the topClass construct should be given these larger contexts. For now, I think we are resource-constrained to make changes, but we can leave this ticket open while we accumulate knowledge and/or wisdom.

jvendetti commented 5 years ago

It appears that BioPortal's behavior is a bit inconsistent. In the user interface, we do display the fact that root classes in OWL ontologies are subclasses of owl:Thing, e.g.:

Screenshot 2019-05-28 16 32 40

However in the REST API, we explicitly filter out owl:Thing from the return value of calls to the /parents endpoint, i.e., see this line of code.

In the code for CSV generation, we don't have any special handling for owl:Thing. Rather, we simply ask each class for its parents and output their IDs to the file, i.e., see this line of code.