ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
259 stars 73 forks source link

Auditing ROBOT command chain #1024

Open psiotwo opened 2 years ago

psiotwo commented 2 years ago

Currently, many ROBOT commands can be either run separately (and producing intermediate results), or chained in a single command (thus saving some space/time to (de)serialize the inputs/outputs). As an example, let's consider two options:

  1. Separate calls
    robot query --input input.owl --update update.rq --output temp1.owl
    robot reason --input temp1.owl --output output.owl

This option is suitable for debugging - it allows me to measure execution time, debug the individual pipeline steps (query/reason) and investigate intermediate results (temp1.owl).

  1. Single call
    robot query --input input.owl --update update.rq reason --output output.owl

This option is suitable for production - it does not (de)serialize intermediate outputs, thus saving execution time and disk space.


Switching between these options is not very flexible. It would be beneficial to support option 2 with some global configuration options to make the commands auditable, similarly to the global logging configuration '-v, -vv, -vvv'. The most important switch for my case would be st. like

However, other goodies that would also help me time-to-time while debugging would be

What would be your thoughts on these?

(This scratches the surface of even a more ambitious topic - orchestration of robot commands in some ETL-like tool. But it belongs to another ticket.)

balhoff commented 2 years ago

@psiotwo for the store-intermediate idea, I think it's the case that you can already combine --output and chaining, so that each step both writes a file and pipes to the next command. Not sure if it works for all commands.

psiotwo commented 2 years ago

@balhoff thanks for hint - yes, this would work, although it seems a bit complicated to switch this debug on/off in a Makefile for a ROBOT command chain (can do just st. like $(if $(DEBUG),-o $debug-1.owl,) for each robot subcommand). But maybe it is just my low experience with Makefiles ...

beckyjackson commented 1 year ago

Running with -v will produce subcommand timing with the logger:

WARN  Subcommand Timing: convert took 0.175 seconds

To store logs in a file:

robot convert --input foo.ttl --output bar.owl -v 1> log.txt

The only item here that can't currently be done while chaining commands is producing the diff. I suppose you could put it all in one command:

robot query \
  --input input.owl \
  --update update.rq  \
  ${DEBUG:+--output update-intermediate.owl} \
reason --output output.owl && \
[[ -n $DEBUG ]] && \
robot diff \
  --left input.owl \
  --right update-intermediate.owl \
  --output update-diff.txt

This will only produce the intermediate output file and the diff if DEBUG is set.

psiotwo commented 1 year ago

Thanks @beckyjackson for the hint. Actually, w.r.t. stats I was more interested in some machine processable output (e.g. JSON), for some subsequent analytics.

To be able to track different outcomes, I am currently using the following pattern. The proposal for diff handling you suggest seems gtm!

DEBUG=true
robot <COMMAND-1> -i input.ttl ... $(if $(DEBUG),-o $output-1.owl,) \
          <COMMAND-2> ... $(if $(DEBUG),-o $output-2.owl,) \
          <COMMAND-3> ... $(if $(DEBUG),-o $output-3.owl,) \
          ...
          <COMMAND-N> ... $(if $(DEBUG),-o $output-N.owl,) \
          <COMMAND-N+1> ... -o output.owl