ontodev / robot

ROBOT is an OBO Tool
http://robot.obolibrary.org
BSD 3-Clause "New" or "Revised" License
260 stars 74 forks source link

ROBOT REST API #513

Open jamesaoverton opened 5 years ago

jamesaoverton commented 5 years ago

Goal: To be able to use ROBOT via an HTTP REST API from any programming language.

Use Case: My group has a cell name and marker validator written in Python. We'd like ROBOT to load CL, run a reasoner, keep it running, and have little webapp written in Python interactively ask questions about whether new class expressions are satisfiable. (It should be possible to formulate these questions as DL queries, but I'm not yet certain.)

We could build something just for this use case, but I have an idea for a more general solution.

Approach: Create a robot-rest system that wraps robot-command and presents an HTTP REST interface. Each "job" runs a chain of ROBOT commands with a workspace of files. Each "task" is a command in the chain with an execution log. So each job will create and maintain a CommandState object, each task will run CommandManager.executeCommand(), and wait for another task until a stop command, which will unload the CommandState.

Here's an example of starting a job, working with files, running a task, then stopping and deleting files. These paths would be prefixed with something like http://localhost:2019.

GET /jobs -- show the list of jobs and their status (running, stopped, deleted) POST /jobs -- create a new "job", return/redirect to a new job ID "123" GET /jobs/123 -- get lists of tasks and files for this job GET /jobs/123/files -- get a list of files in the workspace for this job (sizes, checksums, dates) PUT /jobs/123/files/bar.owl -- upload bar.owl to the workspace for this job PUT /jobs/123/files/bar.owl?fetch=true -- fetch a file from the POSTed URL and save it as bar.owl to the workspace GET /jobs/123/tasks -- the list of tasks executed and status: currently 0 tasks and "running" POST /jobs/123/tasks?command=convert&input=bar.owl&output=baz.owl -- run robot convert --input bar.owl --output baz.owl inside the workspace, immediately return/redirect to task ID "1" GET /jobs/123/tasks/1 -- see task status, STDOUT+STDERR GET /jobs/123/files/baz.owl -- download the baz.owl file GET /jobs/123/views/baz.owl -- view the baz.owl file (not sure about 'views' name) POST /jobs/123/tasks?command=stop -- stop this job, which will unload the CommandState from memory and reject further tasks, return/redirect to new task ID "2" DELETE /jobs/123/files/baz.owl -- delete the baz.owl file DELETE /jobs/123 -- delete a job and its files, keeping only some metadata

I'm hoping that this can be a thin layer that works with very few modifications to robot-command. The trick is to translate the HTTP query string into the command-line arguments that each ROBOT command expects. HTTP query strings will not map perfectly on to ROBOT command options, but maybe well enough. While the sequence of commands/tasks is significant, the sequence of options for a single command is not. Some options can be specified multiple times: query string should allow repeated keys, but if that doesn't work then I think we could support a single value that is an array.

There are a few cases where I would want to modify existing robot-command code. The query command would be much more useful if we presented a SPARQL web form and allowed a bunch of queries without reloading Jena.

GET /jobs/123/tasks/4/sparql?select=some-sparql-query -- if task 4 was query, and it's the current task running, then provide access to Jena, run some SPARQL query, and return results

For a long time I've been thinking of adding a --server option to the command-line version of query that would wait and accept interactive queries until the user hits Ctrl-D or something (#25). This would build on that. Our use case requires something similar for DL queries, which is another feature we've wanted for a long time (#387).

Feedback and other use cases would be appreciated.

cmungall commented 5 years ago

Hmm, this seems to be moving ROBOT into a crowded space e.g CWL, CWLRunner, Galaxy, NextFlow. Many of these support APIs with similar functionality, e.g. WES. Will provide more details later....

jamesaoverton commented 5 years ago

Sure, I'd love to hear more about alternatives.

What I really want is easier access to the functionality we already have, with as thin a wrapper as we can manage. I'm not interested in competing with these projects. We have an issue mentioning CWL #37.

Is there something out there that can talk to a reasoner over a "wire" (HTTP being one example)?

cmungall commented 5 years ago

Use Case: My group has a cell name and marker validator written in Python. We'd like ROBOT to load CL, run a reasoner, keep it running, and have little webapp written in Python interactively ask questions about whether new class expressions are satisfiable. (It should be possible to formulate these questions as DL queries, but I'm not yet certain.)

also

Is there something out there that can talk to a reasoner over a "wire" (HTTP being one example)?

Have you seen @balhoff's https://github.com/phenoscape/owlery - it seems to fit this use case perfectly.

see https://owlery.phenoscape.org/api/

for swagger

We have an issue mentioning CWL #37.

Yep. Still not sure if CWL is a good fit for ontology workflow tasks but its usage is increasing rapidly in other projects I am on. In the context of this ticket I was thinking specifically of the TES API which seems to fit the kind of REST operations you want to do here:

https://github.com/ga4gh/task-execution-schemas/blob/master/README.md

I think this is a potentially important and useful feature, deserving of more serious consideration than the bitty responses I am providing here. Shall we schedule some time at ICBO to talk about some of this, or do you need something before then?

jamesaoverton commented 5 years ago

Thanks. owlery does sound like a good fit for the immediate use case. I'll check that out.

My proposal here is in interactive in a way that these general task runners are not. In those systems you trigger a job, it runs to completion, and the only interactive thing you can do is cancel the job. In my proposal here you keep the ROBOT CommandState in memory, where it can be queried, and you can decide what to do next.