yesworkflow-org / yw-prototypes

Research prototype with tutorial. Start here to learn about and try YesWorkflow.
http://yesworkflow.org/wiki
Other
33 stars 13 forks source link

How to make "provenance queries?" #30

Open olyerickson opened 8 years ago

olyerickson commented 8 years ago

The title says it all; it is not clear how we make the provenance "queries" or script run reports as discussed in Section 4 of, "Retrospective Provenance Without a Runtime Provenance Recorder" (McPhillips, et.al.)...

tmcphillips commented 8 years ago

What one currently does is export datalog or prolog compatible facts that describe the YW annotations extracted from the source files, the workflow model that YW constructs based on these annotations, and any reconstructed retrospective provenance of products of a script run. You then write queries of these facts in datalog or prolog.

The src/main/resources/examples/simulate_data_collection/yw/xsb directory includes the three facts files for a run of the simulate_data_collection.py script from the paper you mention, a file with general rules for use in queries, and three files containing the provenance queries themselves (e.g. recon_queries.P includes the queries included in the paper). The run_queries.sh script runs all these queries using xsb, and the run_queries.txt file contains the expected output. You can reproduce these results by installing xsb and running the script (alternatively you can do all this using DLV with the analogous files in the dlv directory.)

Eventually we'll probably want to build a query engine into YW itself so that these additional tool installations aren't required. We'll also want a streamlined query language of some kind so that writing new queries is a lot easier (and shorter). You can imagine such streamlined queries being replaced by their results during expansion of some kind of report template to produce a run report. Plenty of work to do!