Lack of Full OWL2 Support in Triplestores

dbooth-boston commented 5 years ago

"There are only two triplestores that I could find that have full OWL2 support: Marklogic (maybe, I am still trying to understand their documentation on the point) and Stardog (was once Pellet). . . . My collaborators and I do not have the time to hand code that someone is the ancestor or descendant of someone else. Thus, we need OWL to do the heavy work for us. There are other OWL2 implementations out there: HermiT and FaCT++ but these seem completely disconnected from say Apache Fuseki. Additionally, they seem only to be used in things like Protege and not anywhere else." https://lists.w3.org/Archives/Public/semantic-web/2018Dec/0088.html

VladimirAlexiev commented 5 years ago

ancestor or descendant of someone else

In my experience people use very few OWL features: transitive, inverse (which is redundant anyway), and that's about it.

Ontotext GraphDB supports OWL2 RL, OWL2 QL, a bunch of smaller profiles, and custom rules. (Disclosure: I work for them)

I am yet to see serious applications of OWL DL over large data. Looking forward to such examples.

irenetq commented 5 years ago

I agree with @VladimirAlexiev. I think the real issue is the assumptions underlying the e-mail snippet that "gave birth" to this issue - expectation that DL reasoning is a)important and even necessary and b)viable over large data.

In my experience with Sem Web so far (over 15 years) neither of these assumptions is true. It is possible that we will see proofs to the contrary, but I do not believe this will happen in the next few years. Even academic research seems to have moved on. There have been no new DL reasoners in awhile. Someone told me that the last ISWC was the first one without any DL related papers presented.

In practice, people are using rule-based approaches. And triple stores support these. Most of them support user-defined rules for inference and OWL-RL (because it is in rules).

I believe that the real issue is that there are so many academic papers and writing from the past that are heavily OWL-centric. As RDF* technologies moved out of academia and into a real use, we started to get a lot of good experience with what users actually need and use. However, this experience may not be presented/described well enough and, given the presence of all the older, often academic writing, it is easy for new users to assume that they need OWL.

Even more generally, the whole terminology and the stack of standards is confusing for new users. As a result, it is understandable that they may, for example, be equating OWL2 with DL reasoning while one of the key developments in OWL2 was introduction of profiles. Or may think that they must have OWL to figure out the full closure of ancestor relationship.

dbooth-boston commented 5 years ago

@irenetq , I very much agree, with one small caveat: there are some users who rely heavily on OWL. But this is one of the reasons why, in educational materials, I think we need to distinguish between some of the major use cases, because the OWL use case is quite different than the typical data integration use case.

we started to get a lot of good experience with what users actually need and use.

Please share that experience in these discussions! And feel free to create a new issue if none of the existing issues fits a particular observation. Also, if you

VladimirAlexiev commented 5 years ago

Hi @irenetq! Please add a bit to your github profile for people who don't know you.

There are some industry ontologies that rely heavily on OWL:

Financial: FIBO
Engineering: ISO 15962
Agro/bio/lifeScience ontologies, although based on OWL, in my uninformed opinion do not use it very heavily.

I invite some of the participants in those efforts to share examples of actual heavy use, and what kind of reasoning was required: @dallemang, @ElisaKendall, @TechInvestLab

(please if you're in the know, invite more people)

irenetq commented 5 years ago

This may be an interesting article to look at https://dzone.com/articles/my-list-of-7-great-2018-advancements-in-enterprise. It points out that while (according to the search statistics) interest in Knowledge Graphs is up, interest in ontologies is down https://dzone.com/articles/my-list-of-7-great-2018-advancements-in-enterprise. It also makes this point in explaining slow adaption:

Sheer perceived complexity: While RDF might be super simple in concept, the RDF was often described and discussed by people in the academic “reasoning” community — producing not the most easily approached documents, and countless opinionated discussions.

In short, lack of pragmatic focus hurts wide spread adaption.

We, at TopQuadrant, have done a lot of work with ISO 15962. In this work, there was no reasoning per-say. It was used mainly as a canonical model/vocabulary for data integration. Certainly, not anything requiring DL. There was ETL from non-RDF feeds to RDF data structured according to 15962.

Also, have done a lot of work with Agro/bio/lifeScience ontologies. In my experience, they are largely used as reference data/taxonomies. For example, to establish mappings to them e.g., link to a SNOMED term. Synonyms, alternative names are used in search. In practical use, classes may get transformed into instances to make the use more convenient e.g., SKOS.

I suspect that in many cases, these should be instances in the first place because these ontologies commonly contain classes like bao:AlliedTechnologies. Creators of the ontologies can't answer what it means to have such a class - what should its members be. I have asked them and got a response that they use a principle "if in doubt, make it a class". And that when they use the ontology in applications and load it into a triple store, they either turn these classes into instances or create "fake" instances treating something like bao:AlliedTechnologies as a singleton class. Of course, if you create everything a class, then you are forced to use restrictions instead of simply triples to describe relationships.

FIBO is delivered in multiple formats. In addition to OWL, there is FIBO in SKOS. Then, there are some non-RDF delivery formats. I believe some of the potential use cases are about data harmonization/integration and some are about search/NLP. Possibly, others.

dallemang commented 5 years ago

Indeed, one of the motivations for FIBO-V was exactly this - the complexity of OWL. A lot of applications just use simple connectivity of concepts; reasoning is done by a bespoke program (which is where developers like it).

I am particularly interested in @irenetq 's comment about how some ontologies use classes. I have a rather poorly named anti-pattern in Working Ontologist that is exactly this one; using a class when you can't answer the question, "what are the members of this class?" (and while FIBO has its faults, this is not one of them; it is pretty clear to see in just about every case what it would mean to be member of a FIBO class). Thank you for your timely comment, Irene; I have a task on my white board for the third edition to re-visit this section. You have re-confirmed my original intuition that it is needed.

VladimirAlexiev commented 5 years ago

"if in doubt, make it a class": sounds like a joke, but in very bad taste ;-) I've seen the same, maybe not in core OBO ontologies but in eg the Crop Ontologies.

@irenetq very interested to hear more about your 15962 experience.

@dallemang but FIBO-V is a small underdeveloped brother of FIBO, isn't it?

dallemang commented 5 years ago

No, FIBO-V is a derivative product; we translate FIBO from OWL into SKOS, in an automated way. The motivation for this was basically what Irene mentioned; many people want to know the contents of FIBO and the context of each term, as relationship to other terms. FIBO-V provides this. @irenetq was part of the committee that reviewed the specs for this product.

w3c / EasierRDF

Lack of Full OWL2 Support in Triplestores #38