groovytron commented 7 years ago

Here are a few use cases of applications that would be deployed:

An application that consists in a simple command call like python -m SimpleHTTPServer or service start docker. An example boutiques descriptor for this case is available here.
An application that is used by calling a command but with a few parameters like python -m SimpleHTTPServer [PORT] were [PORT] is the port you want to use for the HTTP server. In that case, the boutiques descriptor should look like this.
An application that is containerized but doesn't need any parameters or environment variables like hbpmip/woken container. Here is an example boutiques descriptor for this case.
An application that is embedded in a container which needs some environment variables to be set. A good example of this case is hbpmip/portal-frontend. This container's environment variables set number of nginx workers, the backend adress and other information. The boutique descriptor for this case is here. The container might also need to be run with given entrypoints.

Pros about using boutiques for all the applications:

The environment-variables field is practical for command line tools and containerized applications needing environment variables to be set to work correctly.

A few cons about using boutiques:

The field command-line is obligatory even if the application is containerized and doesn't need to be launched using a command line. Docker already has ways to give a commands.
Boutiques doesn't handle services based on multiple containers.
As inputs and output-files have to be present, we must put empty tables in those field to be conform the boutiques schema. This kind of workaround would have to be applied for cases 1, 2 and 3.

The main problem is that Boutiques is designed for workflows (Pegasus is one of the example platform that uses it). The MIP has two kind of entities that need to be deployed; services and workflows (in the future).

I was proposed to use Boutiques's LocalExecutor. Here are some points that explain why this tools cannot be used in our case:

The boutiques LocalExecutor seems to be useful when you have your application descriptions stored in files which is not the case here as they are stored in a database.
This LocalExecutor can generate the command based on the tool description. This functionality fits right to case 2 when command lines need to be customized with parameters. The only problem is that it prints it instead of proposing a function that returns it. It's not really a problem if it's possible to add this functionality or create a custom generator. It can be part of the project.
The LocalExecutor explicitly runs the Docker container with the docker run command (see here and here) so it cannot be used to execute containers inside Marathon as it has its own proper way to do it.
The LocalExecutor supports only python 2.7 and is not compatible with python 3.x.

A new tool would have to be developed to generate the proper command to launch. Taking more time to deliver a working product.

I suggest developing the following solution to satisfy the needs in a more generic way.

Proposition

User (system administrator) could add tow kinds of applications; cmd (such as simple scripts or services that may accept parameters) and containers (whose environment variables and ENTRYPOINT parameters could be customized).

entity_relationship_model_02

old_entity_relationship_model

To submit a cmd application the user would have to post a request to the API with the following content:

{
  "name": "my tool",
  "description": "my new service for the platform",
  "cmd": "my_tool",
  "marathon_config": {
    "cpus": 0.3,
    "memory": 64,
    "args": "--a=12 --b='x'"
  }
}

The marathon_config can be defined after the app the has been added to the services registry. Thus the fields name, description (eventually) and cmd would be the only obligatory fields to register this service. The marathon_config just defines the execution details so that a service could have many marathon_configs so that it can be executed more than one time with different parameters. Parameters can also be Marathon's variables like $PORT0 which is not possible with Boutiques as $PORT0 is not a valid input.

If a dockerized application would have to be deployed, the user could submit the following JSON:

  "name": "portal-frontend",
  "description": "frontend for the MIP platform",
  "namespace": "hbpmip",
  "image": "portal-frontend",
  "marathon_config": {
    "cpus": 0.4,
    "memory": 256,
    "env_vars": {
      "PORTAL_DB_URL": "jdbc:postgresql://172.22.0.1:5432/portal",
      "PORTAL_DB_SCHEMA": "public",
      "WOKEN_AKKA_PORT": "8088"    
    },
    "image_version": "v1.0-Florence",
    "args": [ "value1", "value2"],
    "ports": [8080, 5643 ]
  }

namespace and image define the docker image to pull. The marathon_config has the same role as in the case of a cmd application. It contains the execution settings and is adapted for a docker container. User can choose the image's version to use (latest if not submitted). Environment variables can also be set and container's ENTRYPOINT parameters can be specified in the args key.

I encourage to use the above proposition because:

It will handle service that accepts parameters and needs to be exposed through ports granted by Marathon.
Dockerized applications problems like environment variables, entrypoint parameters and ports exposition are handled. (which are features that were requested).
It is service oriented but launching cmd tools (workflows) remains possible.
This proposition seems by far easier to implement. Increasing chances that a frontend is developped.
A boutiques support is possible if time remains or after the project.

groovytron commented 7 years ago

LocalExecutor needs a json containing the input parameters values to generate the command.

groovytron commented 7 years ago

cbrain uses boutiques as an input format but doesn't use it internally (https://github.com/aces/cbrain/blob/87c9ce3ac57a8da51dd868520699c71722997163/BrainPortal/lib/cbrain_task_generators/schema_task_generator.rb).

ludovicc commented 7 years ago

I did not ask you to use LocalExecutor from Boutiques.

groovytron commented 7 years ago

I've had a second thought. And simplified the Entity - Association schema. I'll still focus on the MarackerApp but integrate Boutiques as an input format is still a part of the thought. I'm still thinking about what information I'll keep from a boutiques input (eg: environment variables' description that could be shown in the UI).

entity_relationship_model_04

old_entity_relationship_model_04

The MarackerApp is supposed to handle the following cases:

The app is only a command line tool. Only Marathon's cmd key will be used.
The app is a command line tool with arguments. MarackerApp's command and args attributes will be joined together in the cmd key of Marathon's JSON.
The app is a simple container (needing ports to be exposed or not). MarackerApp's DockerContainer has to be set and only Marathon's container key will be used for deployment.
The app is a container launching a command without arguments at startup. The MarackerApp's command and DockerContainer must be set. When deployment is requested, it will generate Marathon's JSON using its cmd and the container keys.
The container has an Entrypoint. Only MarackerApp's command and DockerContainer attributes have to be set. Marathon's agrs and container keys will be used to deploy the application.
The app is a container that need to launch a specific command with arguments. Then MarackerApp's command, container and args will have to be set. For deployment, command and args will be joined and placed into Marathon's cmd key (as cmd and args cannot be both supplied to Marathon). The container's details will be specified using Marathon's container key as usual.

groovytron / maracker

Boutiques as part of the specification #10

Proposition