Auditing ROBOT command chain

psiotwo commented 2 years ago

Currently, many ROBOT commands can be either run separately (and producing intermediate results), or chained in a single command (thus saving some space/time to (de)serialize the inputs/outputs). As an example, let's consider two options:

Separate calls

robot query --input input.owl --update update.rq --output temp1.owl
robot reason --input temp1.owl --output output.owl

This option is suitable for debugging - it allows me to measure execution time, debug the individual pipeline steps (query/reason) and investigate intermediate results (temp1.owl).

Single call

robot query --input input.owl --update update.rq reason --output output.owl

This option is suitable for production - it does not (de)serialize intermediate outputs, thus saving execution time and disk space.

Switching between these options is not very flexible. It would be beneficial to support option 2 with some global configuration options to make the commands auditable, similarly to the global logging configuration '-v, -vv, -vvv'. The most important switch for my case would be st. like

--store-intermediate that would create intermediate --outputs of all the commands in the chain.

However, other goodies that would also help me time-to-time while debugging would be

--store-diffs - will compute robot diff between each input/output pair in the chin or even robot merge/unmerge to obtain a "machine-processable" diff.
--stats - that would compute some basic auditing metadata (e.g. execution times) of the execution of each of the commands in the chain.

What would be your thoughts on these?

(This scratches the surface of even a more ambitious topic - orchestration of robot commands in some ETL-like tool. But it belongs to another ticket.)

balhoff commented 2 years ago

@psiotwo for the store-intermediate idea, I think it's the case that you can already combine --output and chaining, so that each step both writes a file and pipes to the next command. Not sure if it works for all commands.

psiotwo commented 2 years ago

@balhoff thanks for hint - yes, this would work, although it seems a bit complicated to switch this debug on/off in a Makefile for a ROBOT command chain (can do just st. like $(if $(DEBUG),-o $debug-1.owl,) for each robot subcommand). But maybe it is just my low experience with Makefiles ...

beckyjackson commented 1 year ago

Running with -v will produce subcommand timing with the logger:

WARN  Subcommand Timing: convert took 0.175 seconds

To store logs in a file:

robot convert --input foo.ttl --output bar.owl -v 1> log.txt

The only item here that can't currently be done while chaining commands is producing the diff. I suppose you could put it all in one command:

robot query \
  --input input.owl \
  --update update.rq  \
  ${DEBUG:+--output update-intermediate.owl} \
reason --output output.owl && \
[[ -n $DEBUG ]] && \
robot diff \
  --left input.owl \
  --right update-intermediate.owl \
  --output update-diff.txt

This will only produce the intermediate output file and the diff if DEBUG is set.

psiotwo commented 1 year ago

Thanks @beckyjackson for the hint. Actually, w.r.t. stats I was more interested in some machine processable output (e.g. JSON), for some subsequent analytics.

To be able to track different outcomes, I am currently using the following pattern. The proposal for diff handling you suggest seems gtm!

DEBUG=true
robot <COMMAND-1> -i input.ttl ... $(if $(DEBUG),-o $output-1.owl,) \
          <COMMAND-2> ... $(if $(DEBUG),-o $output-2.owl,) \
          <COMMAND-3> ... $(if $(DEBUG),-o $output-3.owl,) \
          ...
          <COMMAND-N> ... $(if $(DEBUG),-o $output-N.owl,) \
          <COMMAND-N+1> ... -o output.owl

ontodev / robot

Auditing ROBOT command chain #1024