rtnpro commented 8 years ago

Problem statement

The way it is now, Nulecule apps are not that reusable. It's easier for us to add dependencies internally, rather than, add external dependency. This can be seen in most of our examples in [[1]]. There are a few shortcomings in having dependency on external Nulecule applications.

Namespace conflict

When we consume external Nulecule applications in our Nulecule application, we have no control over the component names chosen in the external applications. For example, if our app contains two external Nulecule applications: A, B, and both A and B contains a component named db, then params for both these different components with same name db would get merged into the same section [db] in answers.conf. As a consumer, of a Nulecule application, why should I bother with the internal components of the consumed Nulecule application?

Redundant params in ANSWERS

Let's take the example of a our nulecule wordpress[[2]] application. The sample answers file generated during installing this app looks like:

[wordpress]
db_user = None
image = wordpress
db_pass = None
db_name = None
db_host = mariadb:3306
port = 8080
[mariadb-atomicapp]
db_pass = None
db_name = None
root_pass = MySQLPass
db_user = None
[general]
namespace = default
provider = kubernetes

We see that params like db_user, db_pass, db_name, etc. gets duplicated in multiple sections. Currently, there's no model in place params inheritance, which seems like a sensible thing to do.

Brainstorming

The way we currently load a Nulecule application is by doing a DFS (Depth First Search) traversal of the Nulecule application tree, similar to nested function calls. Each node, or component, in our Nulecule application tree, is like a node in a tree data structure. Each node (except for root) has a parent and may have some children.

However, we do not apply the same concept when dealing with params for our Nulecule app. The current parameter model is a flat one, rather than a hierarchical one, devoid of any notion of inheritance.

Proposed solution

So, why don't we translate nested nulecule applications to something like nested function calls. The root/master nulecule application defines params: mandatory and optional (with defaults), that it needs to run or be consumed by another nulecule application. These params are defined at the root level of the Nulecule SPEC data, rather than at the component level. Now, we use these defined params, as required, at the component level, to supply the components with necessary data they need to know.

An example will make it easier to understand the solution we are proposing. We refactor the wordpress and mariadb nulecule application as an example to demo our solution[[3]].

[1] https://github.com/projectatomic/nulecule-library [2] https://github.com/projectatomic/nulecule-library/tree/master/wordpress-centos7-atomicapp [3] https://github.com/projectatomic/nulecule-library/compare/master...rtnpro:refactor_answers

markllama commented 8 years ago

I have been struggling with something that I think is vague about the nulecule usage model.

On one hand I want to define a nulecule as metadata to help consume a microservice container image. There I define my unresolved variables and give them names that will map them into my artifacts.

On the other hand I want to create an application that uses a nulecule as input. Here I want several things which you described, and perhaps one that was missed.

Introspection of the nulecules I am consuming to determine what variables they need resolved
Merging and matching the unresolved variables from many nulecules
providing values for the unified unresolved set of variables so that they can be applied down as required.

Given those two not-really-identical cases (provider and consumer) I'm trying to come up with a syntax that would represent all these operations well. It almost feels like you need two sections:

Declaration: for each graph element, what unresolved variables exist
Definition/assignment: for each unresolved variable (some which may be declared on multiple graph elements) what value should I provide?

This would lead to two "parameters" sections, one declaration for each consumed graph element and one assignment section for the nulecule as a whole.

markllama commented 8 years ago

OK, I could edit that and avoid looking silly, but I'll re-comment instead.

I guess the second "parameters" section I mention, the definition/assignment only needs to be in the global scope of the answers file.

The only reason to have entries in the graph element sections of the answers file is if I need to map two mismatched parameter names from different consumed nulecules to a single provided value.

kanarip commented 8 years ago

It seems like something as simple as:

{
  "params": [
    {
      "name": "some_param",
      "value": "some_value"
    }
  ],
  "graph": [
    {
      "name": "some-external-app",
      "source": "docker://some/external-app"
    }
  ]
}

could be made to have some-external-app inherit some_param with value some_value.

Would that resolve the problem?

UPDATE: It would not solve the problem for A and B both requiring a mariadb-centos7-atomicapp directly, though perhaps they could be specified params to separately;

{
  "params": [
    {
      "name": "some_param",
      "value": "some_value"
    }
  ],
  "graph": [
    {
      "name": "some-external-app",
      "source": "docker://some/external-app",
      "params": [
        {
          "name": "some_param",
          "value": "a_completely_different_value"
        }
      ]
    }
  ]
}

dustymabe commented 8 years ago

@kanarip it could.. but having it be more explicit by defining the params as well as the args (like in [3]) would be better for knowing which params apply to which nulecules rather than just having them all inherit all params somehow.

In this model the Nulecule is kind of like a function call. The nulecule defines it's inputs (params) and then makes other function calls to other nulecules and provides them their defined inputs as args.

kanarip commented 8 years ago

Right, I had not seen that example but it makes sense.

I reckon params on the leaf does not necessarily need to be eliminated altogether though, and could be used to override or supply additional params.

This way, root_pass can be different for two mariadb-centos7-atomicapp-based leafs and does not need to be specified globally.

markllama commented 8 years ago

I think you've just identified another dimension which I've been thinking about.

Ideally, the list of parameters (of all kinds) would be provided from the container image, perhaps in the form of an included Dockerfile, but ideally from some other component of the image. [1]. The LABEL mechanism offers one means, and OpenShift is already using it to provide additional metadata about image characteristics [2].

Using labels, the container image producer can embed metadata about the run-time requirements of the container: environment variables, CLI arguments, external data volumes (location and permissions) [3]. This allows the definition of these variables to be inferred from the container image rather than declared directly in the Nulecule file [4].

Note that the EXPOSE and VOLUME directives provide only partial information needed. For example, the VOLUME directive does not indicate the ownership and permissions for the mounted volume so that processes inside can access the data correctly. Additional metadata is needed.

This presents a problem for the image consumers when composing services. There is no way for the consumer to ensure that a parameter used by two images (database service and client) will have the same variable name in the two images. To resolve this, the Nulecule will need to provide a way to map variables with different names from different container images to a single value (DB username, password for example)

In the top level of the Nulecule file that might look like this.

Assume a DB service container named mongodb with these parameters

MONGODB_DATABASE_NAME
MONGODB_USERNAME
MONGODB_PASSWORD

And a client container named myapp with these:

MYAPP_DATABASE
MYAPP_DB_USER
MYAPP_DB_PASSWORD

The container images provide these environment variable names by defining a LABEL

LABEL io.projectatomic.nulecule.environment.required="MONGODB_DATABASE_NAME, MONGODB_USERNAME, MONGODB_PASSWORD"

and

LABEL io.projectatomic.nulecule.environment.required="MYAPP_DATABASE, MYAPP_DB_USER, MYAPP_DB_PASSWORD"

The Nulecule then needs two things:

A way to indicate that the pairs of environment variables are related
A way to provide a value to each of the pairs

The Nulecule designer will likely want to provide a unified name for prompting or for lookup in an answers.conf file. She will also still want a means to provide a human prompt as she can now.

Since variables come from two different images and because they may have the same name, the mapping in part 1 must use the name of the container image to disambiguate in that case.

The parameters section would reside in the top level of the Nulecule file. It would list all of the inputs a user may need to provide to the containers and a mapping from the Nulecule variable names to the container ones.

parameters:
   - name: db_name
     description: The database name for My Application
     default: myapp_db
     map: 
       - myapp.MYAPP_DATABASE
       - mongodb.MONGODB_DATABASE_NAME
  - name: db_user
    description: The read/write user for the database
    map:
      - myapp.MYAPP_DB_USER
      - mongodb.MONGODB_USERNAME
  - name: db_pass
    description: The read/write password for the database
    map:
      - myapp.MYAPP_DB_PASSWORD
      - mongodb.MONGODB_PASSWORD

This would also allow the artifact writers to use the Nulecule parameter names in the artifacts rather than the container image variable names.

NOTE: It might be desirable for the mapping to be provided in the answers.conf file though this pushes the information about how two containers relate off to the end consumer of the nulecule. Combining the automatic runtime generation of the nulecule graph section parameters with mapping and then value resolution requires some thought.

[1] I think that LABEL directives are proper layers and so independent of whether a Dockerfile is actually included in the image. [2] https://docs.openshift.org/latest/creating_images/metadata.html [3] https://github.com/goern/postgresql/blob/feature/enhanced-labels/9.4/Dockerfile.rhel7#L61 [4] https://github.com/goern/grasshopper/wiki

vpavlin commented 8 years ago

First of all, it too bad we already had this figured out after the meeting in Brno:(

@markllama I am missing context in your example - are those global parameters? If yes, does myapp and mongodb map to graph component, or an image? (I'd vote for graph component as image name may change)

Also relying on labels in this case is a bit dangerous as those can change independently from Nulecule metadata, although the same can happen for the code inside the image, so never mind:)

How does this mapping work with nested components? I.e. frontend requires some worker and worker requires mongodb - I don't want to create mapping for worker -> mongodb in the frontend or do I?:)

rtnpro commented 8 years ago

@markllama

On the other hand I want to create an application that uses a nulecule as input. Here I want several things which you described, and perhaps one that was missed.
Introspection of the nulecules I am consuming to determine what variables they need resolved
Merging and matching the unresolved variables from many nulecules
providing values for the unified unresolved set of variables so that they can be applied down as > required.

I was personally excited about the notion of having unresolved variables in Nulecule, which get resolved during the various stages of deploying the Nulecule, for use in the subsequent deployment process. We could also introspect the Nulecule to find the dependencies across components and schedule deployment in a proper order, so that a component B depending on a value x resolved during deploying component A would be deployed after A has been deployed. This'd help Nulecule to perform some basic orchestration as well, useful for providers like Docker, Docker Swam (in the future), not that useful for kubernetes and openshift.

However, it turned out that orchestration is not in the scope of Nulecule. Nulecule is just aimed towards packaging multi container applications and delivering them.

The only reason to have entries in the graph element sections of the answers file is if I need to map > two mismatched parameter names from different consumed nulecules to a single provided value.

I am doing the needed mapping when assigning args to the Nulecule components, e.g., here.

Ideally, the list of parameters (of all kinds) would be provided from the container image, perhaps in > the form of an included Dockerfile, but ideally from some other component of the image. [1]. The LABEL mechanism offers one means, and OpenShift is already using it to provide additional metadata about image characteristics [2].

I would like to keep the implementation of Nulecule separate from possible delivery mechanisms: docker or other container images, tarball, etc.

markllama commented 8 years ago

@rtnpro thanks,

It looks like the work you're doing to refactor the paremeters/args structures answers some of what I would like. You're doing the mapping "in reverse" so to speak, but with the same result.

Regardless of the implementation of the containers, it would seem like we should insist that the container developer tell the user how they are meant to be used. This is the point of adding labels ala grashopper and OpenShift continer image metadata.

It would seem that the only information in the nulecule would be that which is not provided by the container images.

This is really two issues:

resolving parameters from different images which take the same value
extracting information from container metadata (avoiding hard coding container dev provided information)

I guess I'll wait on your changes to see how #1 works out. #2 remains unresolved, but it's out of scope right now.

rtnpro commented 8 years ago

Hey folks,

I have implemented an intial POC for the proposed refactor of Nulecule params. You can try it in the following way:

Pull changes in atomicapp

git remote add rtnpro https://github.com/rtnpro/atomicapp
git fetch rtnpro refactor-params:refactor-params
git checkout refactor-params

Pull refactored sentry nulecule application in nulecule-library and run it

cd nulecule-library
git remote add rtnpro https://github.com/rtnpro/nulecule-library
git fetch rtnpro sentry-refactored-params:sentry-refactored-params
git checkout sentry-refactored-params
cd sentry-atomicapp
sudo atomicapp run --provider docker .

Give it a shot and share your feedback.

projectatomic / nulecule

Shortcomings with current answers/params structure towards implementing nested Nulecule application #187

Problem statement

Namespace conflict

Redundant params in ANSWERS

Brainstorming

Proposed solution

Pull changes in atomicapp

Pull refactored sentry nulecule application in nulecule-library and run it