phillord / tawny-owl

Build OWL Ontologies in a Programmatic Environment
GNU Lesser General Public License v3.0
251 stars 54 forks source link

Thoughts re: tawny-owl and immutability, monotonicity, and persistence #6

Closed danlentz closed 10 years ago

danlentz commented 10 years ago

I have been looking at tawny-owl while getting my feet wet with clojure in general and it looks quite nice. I am also interested in the possibility of integrating tawny's concepts with those that are compatible with a persistence solution -- datomic being the principal target of interest, as it is already quite a nice triple-store. Naturally, there are some aspects of tawny which currently conflict with the requirements of such an idea as I see them. I was wondering if there were existing thoughts from the tawny developers (community?) to this end and if so I'd be very interested to hear them.

At last read of the documentation, there seems to be a "throw our hands up" sort of "we don't do that" kind of feel to the questions related to immutability. However, coming from a common-lisp background I have worked with a very interesting package http://github.com/lisp/de.setf.resource which implements a very interesting model for approaching these issues and I wonder what is the level of enthusiasm that some of the concepts it proposes may also prove to be quite relevant with regards to clojure and tawny-owl.

There are a number of areas suitable for discussion, but IMO, The central technical matters relate to the notion of the resource "life-cycle" which is a concept that is shared with other RDF persistence software, namely Spirograph which is a somewhat more well known solution in the Ruby space. Also Id mention as prior art the LSW project which focuses on the owl<->triples bijection by means of rulesets defined in SWRL.... But I don't want to get ahead of myself here, so perhaps it would be most productive to focus on the first two paragraphs, above, and leave this one be, for now. :)

Regards, Dan

phillord commented 10 years ago

Immutability wasn't a key concern when I was building tawny -- effectively, I consider Tawny to be a (textual) UI for OWL, rather than an API. If I could have got immutability for free, then I would have been happy. In fact, I started with clojure data structures for ontologies (records actually), but quickly dropped it and just used the OWL API object directly.

Having said all of this, I have now added tawny.read, and more tawny.render. The latter turns OWL (as a XML file) into a clojure code that you can evaluate and run. Of course, code is data, data is code, so this effectively means that I now have a representation of OWL in clojure. Not a particularly good representation admittedly; tawny.query is a very initial attempt to turning into something nicer.

So, it would, potentially, be possible to turn the main tawny functions into something that manipulated Clojure functions alone. So

(ontology "blah") 

would probably return an atom with a set.

(owlclass ont "bob" :subclass charlie) 

would add the appropriate things to ont.

I could still use the OWL API to do a lot of the work; this can be made thread-safe and pure by simple virtue of creating a new OWLOntologyManager, and OWLDataFactory for every operation; clearly, I don't want to be writing parsers and renders. Things are harder when we get to reasoners; I would have to throw away the reasoner every time the ontology changes, which would break any hope for incremental reasoning. I don't see how to avoid this -- the OWL API assumes mutability with listeners and the like. To circumvent this, I think, would require an implementation of the OWL API in Clojure.

But it would be a lot of work -- having written one framework for turning Clojure into OWL API objects, and another for doing the reverse, I know it's a lot of work. In the abstract a persistable, and thread-safeable API for OWL is attractive, but committing the resource to achieving this would require more than an abstract result. Obviously, if you are interested in doing something like this within the scope of tawny, then I'd be interested in collaborating.

From what I can see off...

http://github.com/lisp/de.setf.resource

tawny effectively does this, I think? In terms of projecting OWL into lisp space.

danlentz commented 10 years ago

Thanks for your response. You've given me a bit to think about here. I guess my initial feeling is regarding XML, being principally a serialization format, does not offer the advantages one might obtain from a fully indexed triple-store, although I suppose there are mechanisms for approaching that...

My first thoughts had run along the lines combininG the low-level aspects of de.setf.resource with the higher-level functionality of LSW, ie, encoding a ruleset that allows for the projection from a triples repository to a "model" and back again. On top of this would be required the implementation of "life-cycle" integrated with the transactional nature of communication via some "mediator".

I can provide the URL for this LSW project if you are not already familiar (it's hosted on some non-github thing so I don't have this immediately at hand). It bears some similarity to tawny, in that it is based on JVM (abcl), but IIRC uses Jena which may be more accommodating to this line of thinking than OWL API. As I mentioned, much of this is new to me, so forgive me if I am mistaken in any of these details.

I'm wondering also if you are familiar with SWCLOS, an older project that implements an integration of OWL full and common-lisp CLOS.

phillord commented 10 years ago

The OWL API uses an abstraction a bit higher than that of the triple -- it operates over the semantics of OWL rather than the RDF representation of this. A persistant solution would be good, but ultimately, the OWL API would probably be the right place to implement this. And indeed it has been twice, although over a relational database

http://protegewiki.stanford.edu/wiki/Loading_A_DatabaseProject

I don't know how well these work, but plugging them into tawny should be trivial. The current implementation of tawny makes it more or less impossible to mix and match this though -- using a database and in-memory backend in the same JVM would be difficult. That's easy enough to fix, should it be necessary.

I hadn't heard of LSW or SWCLOS; the latter looks interesting, and bears similarities to what I have done. Being hosted on allegro isn't ideal though, and as far as I can tell, it doesn't plug into a reasoner. Couldn't find LSW.

danlentz commented 10 years ago

Thanks again for your reply. The link you sent led me to some informative materials. In particular, http://webont.org/owled/2009/papers/owled2009_submission_3.pdf is pretty convincing that the graph representation might not be the easiest (best?) for persisting OWL, especially OWL/DL.

SWCLOS was the original project that I had stumbled onto that ignited my interest in this subject. Alas, it is flawed in several ways, first of which being its Allegro dependency. While that might be overcome with a little effort, it is also flawed in the sense that there is no notion of inherent "context" or even the ability to retract assertions. This leaves the user with a fairly crude procedure of "restart the world" when need be to express a new model or change an existing one. It was, however, IIRC pretty reasonable to implement persistence to a graph db (in my case I used Allegrograph, as I was already sucked into Allegro). I think that was primarily due to the particular specification of OWL Full implemented by Koide being designed as particularly friendly to RDF and Object Oriented design. Some resources I'd looked at previously with respect to this include:

The other project, LSW, is hosted at https://code.google.com/p/lsw2/ This project had reignited my hope that I might achieve a useful rdf/owl facility suitable for general-purpose modeling and application development. At the time, the reliance on jvm/abcl was not acceptable to me, but this has changed now that I've been exploring clojure and so I've been reconsidering the possibility. As I mentioned, though, now that I've looked at things, I believe that LSW's basis in Jena accounts for a greater proclivity to this type of use-case than does owlAPI. I'm not sure at this point if it might be reasonable to bridge these two libraries for the purpose of leveraging your existing work with tawny.

I think tawny implements a very nice "textual ui" as you described it and am open to any ideas as may occur to you on ways to make use of tawny as the basis of a general-purpose persistable modeling facility. It seems to me, though, that graph representation, preferably diatomic based, is very desirable even if in the end it means just working with an RDF model...

Best regards, Dan

phillord commented 10 years ago

Dan Lentz notifications@github.com writes:

This leaves the user with a fairly crude procedure of "restart the world" when need be to express a new model or change an existing one.

This is a problem. To some extend tawny has this difficulties, at least when the macro-ised form of it is used, because it creates new vars for entities; so while it is easy to remove an entity or an axiom from the ontology, it leaves the var around with an unattached OWLObject. This is not so much of a problem with the normal functions.

To some extent this is intrisic to lisp -- for all the Clojure talks about immutability, interning a symbol changes state. At some point, I may add some unintern code to remove vars, but occasional restarts tend to be a fact of life.

The other project, LSW, is hosted at https://code.google.com/p/lsw2/

Ah, okay. Yes, I know this project reasonably well, just forgotten what it was called.

LSW has a somewhat different intention which was to be able to programmatically manipulate OWL (or OBO ontologies) built using other tools, so it has a different focus. Second problem is, as one of the authors said (http://www.russet.org.uk/blog/2214#comment-119774) it doesn't have much documentation.

As I mentioned, though, now that I've looked at things, I believe that LSW's basis in Jena accounts for a greater proclivity to this type of use-case than does owlAPI.

LSW also uses the OWLAPI in at least some points, so I am not really sure about the Jena interface. I know LSW is capable of running sparql queries, which is something I am interested in. At the moment, tawny presents no good interface for querying the ontology. Another possibility would be to add core.match support into the OWL API, but that would be a lot of (very dull) code.

I think tawny implements a very nice "textual ui" as you described it and am open to any ideas as may occur to you on ways to make use of tawny as the basis of a general-purpose persistable modeling facility.

Of course, in a trivial sense, tawny already supports presistance; that is, you can save the code to file. An ontology is data, but in lisp data is code and vice versa. This was an important use case for me, because my experience with shoving OWL files into versioning systems has never been that happy.

It seems to me, though, that graph representation, preferably diatomic based, is very desirable even if in the end it means just working with an RDF model...

I have no problems at all with the graph model of RDF; I'd love to add sparql as I said, and datomic support would be good as well (though, there are some licensing issues with this!). But, at the moment, the only way I can see to achieve this is to effectively render the OWL API to RDF.

Phil

danlentz commented 10 years ago

On Oct 2, 2013, at 5:06 AM, phillord notifications@github.com wrote:

Dan Lentz notifications@github.com writes:

This leaves the user with a fairly crude procedure of "restart the world" when need be to express a new model or change an existing one.

This is a problem. To some extend tawny has this difficulties, at least when the macro-ised form of it is used, because it creates new vars for entities; so while it is easy to remove an entity or an axiom from the ontology, it leaves the var around with an unattached OWLObject. This is not so much of a problem with the normal functions.

To some extent this is intrisic to lisp -- for all the Clojure talks about immutability, interning a symbol changes state. At some point, I may add some unintern code to remove vars, but occasional restarts tend to be a fact of life.

Yes, I've done (a little) thinking on how to handle this. My concept was to be that the owl model would be essentially ephemeral. I.e., built dynamically during the process of projecting a "graph" (context) from the triplestore. When I say triple-store I actually mean quad-store... ;). This capability is necessarily predicated on having a reliable and deterministic triples->owl translation and also to an extent on the "life cycle" concept from a message or two back. If we can always ensure our assertions are made in a manner that is tied to a specific context (even if that context is default-graph or some such) then wouldn't that afford some solution?

The other project, LSW, is hosted at https://code.google.com/p/lsw2/

Ah, okay. Yes, I know this project reasonably well, just forgotten what it was called.

LSW has a somewhat different intention which was to be able to programmatically manipulate OWL (or OBO ontologies) built using other tools, so it has a different focus. Second problem is, as one of the authors said (http://www.russet.org.uk/blog/2214#comment-119774) it doesn't have much documentation.

Not unusual for CL libraries, I'm afraid. It's more surprising when there is... In fact, at this point I've grown mistrustful of documentation as opposed to directly reading (and re-reading...) of the code, anyway. It remains an important advantage common-lisp retains over clojure (IMHO) -- common-lisp reads beautifully. Of course clojure is much more terse and convenient to type, so arguments can be made for either approach.

As I mentioned, though, now that I've looked at things, I believe that LSW's basis in Jena accounts for a greater proclivity to this type of use-case than does owlAPI.

LSW also uses the OWLAPI in at least some points, so I am not really sure about the Jena interface. I know LSW is capable of running sparql queries, which is something I am interested in. At the moment, tawny presents no good interface for querying the ontology. Another possibility would be to add core.match support into the OWL API, but that would be a lot of (very dull) code.

I think given a working owl-triples projection mechanism it would not be much further to implement SPARQL. Datomic supports pattern-based "deductive" queries, which is in effect a poor mans SPARQL, minus some of the amenities. Bastian Muller has a CL library which implements a very basic SPARQL parser and query engine in just a few pages of code that I think would serve as an excellent starting point for building such a facility in clojure. The project is named SICL and lives on github. If you can't find it I'll get you the exact URL -- basically just beware that there is another (unrelated) CL project named SICL by Robert Strandh.

I think tawny implements a very nice "textual ui" as you described it and am open to any ideas as may occur to you on ways to make use of tawny as the basis of a general-purpose persistable modeling facility.

Of course, in a trivial sense, tawny already supports presistance; that is, you can save the code to file. An ontology is data, but in lisp data is code and vice versa. This was an important use case for me, because my experience with shoving OWL files into versioning systems has never been that happy.

Yes, the XML is persistence, you're right. But, at least from my perspective, is not an especially useful form of persistence. Query functionality is quite important. For example, returning to the notion of contexts and projection, it's vital to be able to have the ability to easily express some precise subset of the persisted model to work with. In my thinking, this is really only readily expressive as some set of triples selected from a graph. I can't see how XML could be used in this way. Also, regarding monotonicity, triples provide a useful abstraction by which the model can be augmented and extended by means of simply asserting new statements at will. I don't know how you'd handle that in XML without just overwriting the old XML model with a new one. That, effectively, leads back to this matter of "restart the world" which is not really a useful approach, at least for general-purpose modeling use-case. From what I can see XML gives us only a serialization format and nothing more. Am I mistaken in this impression?

It seems to me, though, that graph representation, preferably diatomic based, is very desirable even if in the end it means just working with an RDF model...

I have no problems at all with the graph model of RDF; I'd love to add sparql as I said, and datomic support would be good as well (though, there are some licensing issues with this!). But, at the moment, the only way I can see to achieve this is to effectively render the OWL API to RDF.

Well it doesn't seem too unbearable to achieve this mapping. There are a finite list of "rules" that would need to be implemented and I think datomic would lend itself to this task as it has a native facility to augment the store with deduced triples based on a set of predefined rules. There may be some art in selection of exactly the proper rules to implement -- I'm sure you have a much better perspective and experience in this regard and I'd most likely ask for your guidance to do so. The two basic options Im aware of:

I'm very attracted to the SWCLOS ruleset (as per the PDF links in my prior message) because it's well defined and amenable to object-oriented style meta modeling -- effectively a next-generation of the CLOS meta-object-protocol, if you will. It is also straightforward in the sense that owl:Thing = rdf:Resource and owl:Class = rdfs:Class, although Some of the discussion by Koide in those papers worries me that there would necessarily need to be some adjustment to the owl model to make this possible; again, this would be something with regards to which I'd need to seek some guidance. The price would be that its owl-full and not decidable, and the related limitations on use with a reasoner?

The owl2 documentation from w3c specifies a ruleset for owl2-dl <-> triples mapping that is interoperable with rdf. This provides a clearly specified approach and fewer unknowns, but also (I think?) severely limits the meta modeling expressivity, but is decidable.

Hope I'm not becoming a nuisance! I appreciate your continued patience.

Dan

danlentz commented 10 years ago

It seems my markdown in the prior message is not clearly showing my comments as distinct from yours prior. I'm not sure why that is, but I hope you can forgive the inconvenience and pick out my responses without too much difficulty.

All thumbs, Dan

phillord commented 10 years ago

Dan Lentz notifications@github.com writes:

Yes, I've done (a little) thinking on how to handle this. My concept was to be that the owl model would be essentially ephemeral. I.e., built dynamically during the process of projecting a "graph" (context) from the triplestore. When I say triple-store I actually mean quad-store... ;). This capability is necessarily predicated on having a reliable and deterministic triples->owl translation and also to an extent on the "life cycle" concept from a message or two back. If we can always ensure our assertions are made in a manner that is tied to a specific context (even if that context is default-graph or some such) then wouldn't that afford some solution?

This sounds plausible. I've been thinking a little about this for other reasons; I quite fancy adding a plugin to protege, so that I can run clojure code directly there to manipulate the ontology. All the messing around I do with interning vars would not be useful there.

LSW has a somewhat different intention which was to be able to programmatically manipulate OWL (or OBO ontologies) built using other tools, so it has a different focus. Second problem is, as one of the authors said (http://www.russet.org.uk/blog/2214#comment-119774) it doesn't have much documentation.

Not unusual for CL libraries, I'm afraid. It's more surprising when there is... In fact, at this point I've grown mistrustful of documentation as opposed to directly reading (and re-reading...) of the code, anyway. It remains an important advantage common-lisp retains over clojure (IMHO) -- common-lisp reads beautifully. Of course clojure is much more terse and convenient to type, so arguments can be made for either approach.

I haven't done any real coding in CL, so I'll take your word for it. I'll probably take a trawl from LSW at some point (or get one of the authors to show me next time we meet).

LSW also uses the OWLAPI in at least some points, so I am not really sure about the Jena interface. I know LSW is capable of running sparql queries, which is something I am interested in. At the moment, tawny presents no good interface for querying the ontology. Another possibility would be to add core.match support into the OWL API, but that would be a lot of (very dull) code.

I think given a working owl-triples projection mechanism it would not be much further to implement SPARQL. Datomic supports pattern-based "deductive" queries, which is in effect a poor mans SPARQL, minus some of the amenities. Bastian Muller has a CL library which implements a very basic SPARQL parser and query engine in just a few pages of code that I think would serve as an excellent starting point for building such a facility in clojure. The project is named SICL and lives on github. If you can't find it I'll get you the exact URL -- basically just beware that there is another (unrelated) CL project named SICL by Robert Strandh.

Yep. Once we have triples, sparql is not hard. Getting the triples is the hard bit!

Of course, in a trivial sense, tawny already supports presistance; that is, you can save the code to file. An ontology is data, but in lisp data is code and vice versa. This was an important use case for me, because my experience with shoving OWL files into versioning systems has never been that happy.

Yes, the XML is persistence, you're right.

Oh, I didn't mean the XML, I meant the Clojure; assuming an ontology has been developed in tawny, then I already have a "persistent" form.

I have no problems at all with the graph model of RDF; I'd love to add sparql as I said, and datomic support would be good as well (though, there are some licensing issues with this!). But, at the moment, the only way I can see to achieve this is to effectively render the OWL API to RDF.

Well it doesn't seem too unbearable to achieve this mapping. There are a finite list of "rules" that would need to be implemented and I think datomic would lend itself to this task as it has a native facility to augment the store with deduced triples based on a set of predefined rules. There may be some art in selection of exactly the proper rules to implement -- I'm sure you have a much better perspective and experience in this regard and I'd most likely ask for your guidance to do so.

I wouldn't be convinced of that! I'm more of an ontology builder than anything; I generally lead all the hard-core algorithmic stuff to others.

  • OWL-FULL / SWCLOS-like

The price would be that its owl-full and not decidable, and the related limitations on use with a reasoner?

I guess that it wouldn't necessarily be owl-full. But, yes, I am most interested in the more tractable susets of OWL.

  • OWL2-DL/RDF

The owl2 documentation from w3c specifies a ruleset for owl2-dl <-> triples mapping that is interoperable with rdf. This provides a clearly specified approach and fewer unknowns, but also (I think?) severely limits the meta modeling expressivity, but is decidable.

Yep. Less expressivity, but decidable. And, less expressivity still, more tractable also. EL is a lot faster

Hope I'm not becoming a nuisance! I appreciate your continued patience.

No worries!

danlentz commented 10 years ago

Well, so maybe a reasonable starting point might be for me to take a stab at implementing an owl->triples mapping using the OWL2-DL/RDF entailments and interoperability ruleset, since that is the most formally specified (and well known) approach.

An open question is whether it is worthwhile to make use of Jena's facilities for intermediate representation of the graph abstraction. Jena has classes for Statement, Resource, etc. I've not really studied Jena at this point to have a good feeling for what nice and useful things it might provide. It might also provide a means of leveraging some of LSW for our benefit. This would entail doing the work to wrap jena in a reasonable clojure api -- would be a bit of work on the front end, perhaps to pay off in the long run. Maybe this properly belongs as a separate project in its own right: clojure-jena or whatever. My initial feeling is to put this off until there would be some specific reason to undertake it, with the understanding that should such a thing come to pass it will probably incur a significant refactoring effort of whatever existing code is in place at that point. Thoughts?

On the other hand, I've got a small experimental implementation of the RDF model in my repository danlentz/datomic-rdf that is loosely based on stuart sierra's original clojure-rdf.model but implemented natively for datomic. It takes care to provide a convenient interface accepting various representation interchangeably and fastidiously interns everything that one would want interned. eg, for resources, (resource! (uri "http://example.com/")) == (resource! "http://example.com/") == (resource! #someuniqueentityid). similarly with bnodes, named bnodes, and literals. statements are added to the db as as a reified set of three triples [for SPO] that are associated with a unique bnode that represents the statement. This provides that a statement may be added to the db without any particular semantic effect on any model. (also this provides for very precise indexing as each constituent of a statement is therefore indexed for its occurance as S P and O) Graphs are represented as a set of these bnodes i.e., a set of these reified statements that define some model. Finally, each graph, as a whole, is attached to a unique bnode which serves as the graph's "name" -- in other words a unique context identifier. Everything is immutable; you create a graph and it is what it is. To effect updates or changes, a new graph, with its own unique context identifier, is created from an existing one with said updates incorporated as you'd expect. Think this might be reasonable and sufficient as graph representation for our purpose?

Pretty cool you hang out with the LSW guys. If you're in contact with someone who might be familiar with the owl-full issues discussed previously, id be interested to hear their take on it. I'll also see if I can track down Seije Koide in the meantime perhaps he might be willing to share some insight.

Dan

phillord commented 10 years ago

Dan Lentz notifications@github.com writes:

Well, so maybe a reasonable starting point might be for me to take a stab at implementing an owl->triples mapping using the OWL2-DL/RDF entailments and interoperability ruleset, since that is the most formally specified (and well known) approach.

Well, the question here is what representation would the OWL be in. Tawny answers that with "OWL API objects", and for this, the mapping already exists (at least in form of rendering to RDF output).

An open question is whether it is worthwhile to make use of Jena's facilities for intermediate representation of the graph abstraction. Jena has classes for Statement, Resource, etc. I've not really studied Jena at this point to have a good feeling for what nice and useful things it might provide.

Having something like tawny, which wraps Jena would be interesting, although it would probably have the same difficulties as Tawny -- it wouldn't feel very clojure -- mutability and so forth.

Maybe this properly belongs as a separate project in its own right: clojure-jena or whatever. My initial feeling is to put this off until there would be some specific reason to undertake it, with the understanding that should such a thing come to pass it will probably incur a significant refactoring effort of whatever existing code is in place at that point. Thoughts?

I would agree that this is a different project; ultimately, having Jena in tawny would be a lot of effort, and probably wouldn't bring substantial additional benefit. For those things where it would be useful (sparql over an ontology), I would start off with a clunky implementation (take an OWL API ontology, render it to RDF, stuff that into Jena, then query).

On the other hand, I've got a small experimental implementation of the RDF model in my repository danlentz/datomic-rdf that is loosely based on stuart sierra's original clojure-rdf.model but implemented natively for datomic.

Just re-reading Stuart Sierra's post; I quite like his idea of interning ontology terms as functions (which return new individuals). I did think of building an implementation of the OWL API which implements IMeta and maybe IFn also. Perhaps I should revisit this.

It takes care to provide a convenient interface accepting various representation interchangeably and fastidiously interns everything that one would want interned. eg, for resources, (resource! (uri "http://example.com/")) == (resource! "http://example.com/") == (resource! #someuniqueentityid). similarly with bnodes, named bnodes, and literals. statements are added to the db as as a reified set of three triples [for SPO] that are associated with a unique bnode that represents the statement. This provides that a statement may be added to the db without any particular semantic effect on any model. (also this provides for very precise indexing as each constituent of a statement is therefore indexed for its occurance as S P and O) Graphs are represented a! s a set o f these bnodes i.e., a set of these reified statements that define some model. Finally, each graph, as a whole, is attached to a unique bnode which serves as the graph's "name" -- in other words a unique context identifier. Everything is immutable; you create a graph and it is what it is. To effect updates or changes, a new graph, with its own unique context identifier, is created from an existing one with said updates incorporated as you'd expect. Think this might be reasonable and sufficient as graph representation for our purpose?

It sounds reasonable (although I am not a great expert on RDF!). One question would be whether you want more explicit support for rdfs. I'll try and take a look at your library. I would like to be able to build RDF in a similar way to tawny.

Pretty cool you hang out with the LSW guys. If you're in contact with someone who might be familiar with the owl-full issues discussed previously,

Whether OWL Full is tractable?

Technically, tawny can produce OWL Full (although not necessarily any OWL full). In fact, I've just realised that it is always producing OWL full, so that's a bug!

Phil

phillord commented 10 years ago

Closing this now, just to clean up!

ChipNowacek commented 7 years ago

I carefully read the datomic free license. It seems designed for this kind of work.

Did you get a chance to look at Dan's library?

phillord commented 7 years ago

I haven't I am afraid, nor have I used datomic, but yes, it has a similar purpose.