Closed markmacgillivray closed 12 years ago
Erm, surely this is in pdcalc issues :-) (or are we moving everything here ...)
I also don't think pdcalc and pdw.net are that close to being ready to go. I think we really need to plan this more than just saying "get this running" :-)
My suggestion: spend 1/2 a day understanding the architecture of code, then plan and estimate all the RDF stuff in favour of JSON (or JSON-LD), boot up.
Mark, Etienne, Primavera had a Skype call yesterday to discuss this. One of the issues raised was: is it feasible to transition from RDF/SPARQL to something else? The consensus was that this would require significant work - even though this is the more desirable option IMHO.
On the short term we can have a version of the current pdcalc running on a web endpoint, that can be called with BibJSON input, and output the results of a reasoning - estimate for this is 1 day work. The pdw2 github repo already contains a first stab at this, and pdcalc contains a BibJSON2RDF module. see: https://github.com/okfn/pdcalc/blob/master/pd/json2rdf.py
I don't think it will take 1/2 a day to transition away from RDF. (maybe for you, but I do not have the RDF/SPARQL chops for that...)
NB: a) there was a typo - i meant 1/2d understanding codebase and planning what would be needed to transition.
Re RDF we really don't want to be supporting random RDF stuff at various versions as it is a sysadmin nightmare ...
Plus I think it is a real problem (going forward) that we have a codebase we don't know a lot about (also comment to primavera: there are no tests in pdcalc AFAICT!)
For now we want to get it up and running again so we can call to it. This is possible, and will run on a micro AMI. The people who had the skill and the time to write the mappings could do it best in RDF, so that is the way it is. We can revisit that later if necessary.
There is a running (albeit not ideal) version up on: http://ec2-79-125-58-175.eu-west-1.compute.amazonaws.com/ Will leave it running for the time being, it is a micro instance so it does not cost much.
But I fully agree with the comments from Rufus above that we would ideally want to transition away from RDF/SPARQL dependancies. I think it is OK to model the flow and diagrams in RDF but there is no hard requirement for the reasoning to be done in SPARQL IMO. We could just as easily have Python snippets embedded in the graph decision tree in stead. And it would probably be easier to read/understand for legal practitioners, as Python is as close to executable pseudocode.
+1 on your 1/2 day estimate just to understand the code. I misunderstood, thought you meant that is what's needed to actually do the work.
...and as first low-hanging fruit, the reasoner should not 'print' the output, but do some structured output. I will look into starting with this as a path to understanding what is going on in the code.
pdcalc and pdw.net are already very close to being ready to go. Just get the code up and running and outputting json on a micro instance. then we can test against it from BNB by AJAXing a record over to it