Open phillord opened 1 month ago
To get myself acquainted with the rdf::reader
module, I am currently refactoring a few bits that could be replaced by e.g. deriving traits or using methods already implemented in vocab
(see #97 and #99).
I agree with your observation.
I am also wondering if the code could be refactored to leverage the fact that the triple parser is streaming, by keeping a HashMap
of triples sets indexed by subject and deserializing whenever the set contains e.g. all the triples necessary to instantiate a declaration.
https://github.com/phillord/horned-owl/tree/feature/rdf-with-partition-cleanup
I've started looking at this. There are definately some cleanups that I can do. The idea of partition_map
may or may not work; so far it makes some things simpler, but a lot of loops are modifying other data structures by side-effect. I think I will re-write a few more functions so I'm clear whether it makes sense or not.
The problem with streaming, is that there are many places where you have to parse the whole stream first, to make the interpretation of later triples. This is, for example, why I parse all declarations (in the entire import closure) first, because many of the axiom patterns depend on understanding whether for example, the kind of a property. So I suspect that streaming RDF not going to bring a lot of benefits.
The RDF parser makes multiple passes through both bnode and simple triples many times. And each time it does so the code is repeated -- something like this.
I think we can replace this whole thing with a
partition_map
(or amap
followed by a partition which would avoid bringing in itertools as a dependency).Partition map looks like this:
L
would be the thing (Atom
,Axiom
what ever)R
would beResult<A>
L
would then be added to whatever,R
would be added back to where ever thing in self it came from in the first placeWould have then:
which is rather neater than what we have at the moment