Open simeoncarstens opened 9 months ago
Hello Simeon,
Thank you for beginning this discussion. I just wanted to make you aware that we released Looper v1.6.0 as well Pipestat v0.6.0 this morning.
~ Donald
Thanks, @donaldcampbelljr! That's good to know - I edited the issue accordingly :slightly_smiling_face:
What were the exact issues you faced with caravel?
First, caravel was written using flask
, but we're now wanting to use fastAPI. More importantly, though, was that the front-end was not reactive. With caravel, there wasn't really an HTTP API. Instead, there was a web interface that you could use to execute commands. In other words, I think caravel had tightly coupled the front-end (web interface) with the back-end (http API) -- and the front-end was not written using a reactive framework. In the new version, I'd like them to be separate. This would emphasize that the API must be asynchronous-friendly, and that the front-end must also be asychronous-friendly.
which commands have the highest priority and should thus be implemented first as HTTP API calls
The most important commands are looper run
, looper runp
, looper check
, and looper report
.
Which of options 1-3.1-3 should we pursue?
I'd advise a hybrid approach:
caravel
, if that's convenient, or use a new repo. Might as well re-use the name caravel
, though, as the old one will become defunctThe original caravel included code to inspect the looper argument parser, and then create HTML forms that mimic it: https://github.com/pepkit/caravel/blob/master/caravel/looper_parser.py
The idea here is that looper's CLI argument parser is the source of truth for how to interact with looper. This code then allows us to make an HTML interface that would automatically update if the CLI interface changes. What you're proposing in option 2.1 is basically this exact same idea, but instead of creating an HTML form interface, you'd create an HTTP API interface. That's a good idea; however, it could get hairy... and I like what you said in 2.3, "A similarly easy and inflexible approach would be POSTing a string of command lines argument that is then parsed by looper's existing argparse argument parser."
I think we could combine these ideas. The API could accept a POST of some CLI string. From here, the API would construct a CLI argument string, and use the existing argument parser. I think this is something close to what we were trying to do, in this code here:
Then, we can use the the HTML form idea for the front-end: from the argparser, create HTML forms, and then have these HTML forms be interpreted into a CLI string, which is then POSTed to the HTTP API.
Alternatively, instead of operating through the CLI string itself, we could introduce a simple config format for defining the arguments. I think there's already a rudimentary way to do this using looper alone, using the .looper.yaml
config file.
What do you think?
A belated (written) thanks to @nsheff for the detailed answer!
We now settled on an approach based on defining the CLI mostly via pydantic
models using (for now) the pydantic-argparse
library (https://github.com/pepkit/looper/issues/438). With a pydantic
model that accurately reflects all arguments and flags a given looper
command might consume, it is then straight-forward to build an HTTP API based on, for example, FastAPI that expects a JSON conforming to this schema in a POST
request.
Let's still keep this issue open to discuss things pertaining to the actual HTTP API implementation, once we get to it.
After discussion, we've decided to move this work to be done after the major work for milestone 2.0.0: https://github.com/pepkit/looper/milestone/14
This issue is coauthored by @zz1874.
looper
is a CLI tool that often runs on the front node of a HPC cluster, so jobs can be submitted to Slurm / SGE / other job schedulers. @nsheff expressed desire for a HTTP API forlooper
which wraps aroundlooper
. That would allow him and other users to runlooper
on the front node and use a reverse SSH tunnel from a different machine to send HTTP requests to the HTTP API. Advantages of this would belooper
functionalities from any machine without manually copying code to the frontend node,An earlier attempt of this was
caravel
(https://github.com/pepkit/caravel). @nsheff tells us that there were issues, possibly due to the synchronous nature of the Flask framework.caravel
seems to be a Python 2.7 code base that uses2to3
to convert to Python 3 code on-the-fly during installation viasetuptools
'use_2to3
. This makes it, in the meantime, hard to runcaravel
for reasons such as:setuptools
doesn't come withuse_2to3
anymore, the Docker image cannot be built anymore, Debian index URLs are out of date, Python 3.6-specific typing imports are used, ...After browsing the
looper
andcaravel
code, we identified the following possibilities:caravel
, meaning bringing it up-to-date with recent Python versions and making it compatible with recentlooper
versions,looper
commands / options could likely be made available via the HTTP API in the little development time we have. But this also means an increased maintenance burden - if a new CLI command / option is added, the HTTP API and its documentation have to be adapted accordingly.POST
ed to the API. This would be the easiest and quickest solution, but limits the use cases of the API. A similarly easy and inflexible approach would bePOST
ing a string of command lines argument that is then parsed bylooper
's existingargparse
argument parser.Important questions that would need to be answered:
looper
should we develop against?looper
is currently at v1.5.1, but there is a PR open for v.1.6.0, and in fact we could only get thehello_looper
example working with the future v1.6.0 oflooper
. A similar question holds forpipestat
, if required for development of the HTTP API. The answer is: v.1.6.0 forlooper
and v0.6.0 forpipestat
, as both new versions have now been released.caravel
? Knowing them would help us make a more informed decision whether to possibly revivecaravel
or to redevelop from scratch, avoiding mistakes made incaravel
. Answer: https://github.com/pepkit/looper/issues/433#issuecomment-1877218543looper
commands: which commands have the highest priority and should thus be implemented first as HTTP API calls? Answer:looper run
,looper runp
,looper check
,looper report
(https://github.com/pepkit/looper/issues/433#issuecomment-1877218543)And finally, of course: