xlab-si / xopera-opera

xOpera orchestrator compliant with TOSCA YAML v1.3 in the making
https://xlab-si.github.io/xopera-docs/
Apache License 2.0
35 stars 14 forks source link

Introduction of compare and redeploy commands #130

Closed dradX closed 3 years ago

dradX commented 3 years ago

Description

In this issue we describe a proposal for enabling opera to compare the existing Deployed Instance model (DI1) with a changed/reconfigured Deployable Instance model (DI2) defined with a new version of the blueprint (DB2) and supplied set of inputs I2 where DI2=calcdeploy(DB2,I2).

Assumptions - A deployed instance is running and the instance model is stored in the .opera directory enabling the user to undeploy the currently running deployed instance with opera undeploy. User story - However the user does not want to undeploy the whole deployed instance but rather patch/reconfigure the existing Deployed Instance (DI1) with a changed set of blueprint and inputs. Before applying changes to the Deployed Instance the user wants to calculate differences between the existing Deployed Instance (DI1) and the target Deployable Instance (DI2). Introduction of compare command: opera compare templ-v2.csar -i input-v2.yaml Opera calculates the internal set of topology changes needed to satisfy the desired reconfiguration Diff=compare(DI1,DI2) and outputs a list of changes between the existing deployed instance (DI1) and the supplied changed deployable instance (DI2). If the user is satisfied with the calculated differences he/she will be able to confirm and execute the calculated changes by issuing a new command for executing reconfiguration DI2=redeploy(Diff). Introduction of redeploy command: opera redeploy Opera executes the internal set of calculated changes of the topology and saves the the results of execution in .opera folder.

Steps

Implementation of compare The user provides a new desired definition of the deployed instance through a set of changed blueprint and inputs and issues the compare command: opera compare templ-v2.csar -i input-v2.yaml Opera creates a desired Deployment Execution Graph (DEG2) from the templ-v2.csar and input.yaml. Opera instantiates the existing Deployment Execution Graph (DEG1 - as done in undeploy) and starts executing the comparison of the two graph's nodes and edges. The nodes/edges in DEG2 can be either unchanged/changed/added/deleted with respect to DEG1. For every node in DEG2 compare tries to find the node with the same node name in DEG1:

For the sake of understanding we introduce the concept of a node in changed group/tag changed_deleted/changed_added wich represents the node version:

In the process of grouping/tagging all nodes from DEG1/DEG2 we can set up the preparation of the two graphs Deploy Execution Graph Delete DEGD and Deploy Execution Graph Add DEGA. :

Both DEGD and DEGA will be created and stored in .opera/.comparison directory as done during the deploy command. The DEGD will hold the deployment graph based on DEG1 (Day1 execution) so unchanged nodes/edges will be marked with initial state thus eliminating the need to undeploy them again, while the changed-deleted/deleted will be marked with started state so opera should delete them on undeploy. The same inverse concept of desired state will be used for nodes/edges in the DEGA deployment graph (Day2 execution) - unchanged nodes/edges will be marked with started state so opera should skip the deployment of them, while the changed-added/added nodes/edges will be marked as initial so opera should add them on deploy.

Compare Output - output can be written to a file using the -f or --file-output switch - a copy of the output will be stored in the .comparison directory as output.txt it will produce a "diff" like output of the topology with marked deletions and additions. opera compare templ-v2.csar -i input-v2.yaml -f ./outputs/myoutput.txt

We would need to implement the new command in /src/opera/commands/ folder.

Implementation of redeploy The user might agree with the suggested comparison by inspecting outputs or not. The user can approve the suggested reconfiguration of the existing deployment by inspecting the "output.txt" and confirm the redeployment with a separate redeploy command. Since the comparison data and execution graphs are already stored in .comparison directory the user should be able to redeploy the wanted changes by issuing just:
opera redeploy Opera would take the data stored in .comparison Deployment Execution Graph Delete (DEGD) and Deployment Execution Graph Add (practically built from the DEG2 with state of the nodes applied) and execute this order of commands

After this the state of the redeployed instance would be stored in .opera as done after the execution of deploy command.

Current behaviour

Currently, opera does not provide a way to patch/reconfigure an existing deployed instance without undeploying the topology as a whole. Using the opera deploy with -f or --force a partial redeployment could be done but no nodes can be removed from the existing deployment instance and the update of the nodes relies only on the immutability of executor script/(ansible playbook) implementation.

Expected results

The user is able to patch/reconfigure an existing deployed instance without having to undeploy the whole running instance by:

  1. providing a new set of changed blueprint and inputs (mycsar-v2.csar, input-v2.yaml),
  2. issuing opera compare mycsar-v2.csar -i input-v2.yaml for the calculation of Diffs stored in .opera/.compare and presented to the user as output,
  3. issuing opera redeploy if the Diffs presented in output satisfy the expected topology changes to execute the deployed instance reconfiguration.

The results of the executed reconfiguration are stored in .opera

cankarm commented 3 years ago

Thanks @dradX for this enhancement description with all the details. First, I would try to break down this issue into a set of smaller issues which will be easier to gasps and will allow developers to frequently contribute and speedup the code reviews.

First, we need to pay attention to the complexity of the comparison. Probably we all assume, that both blueprints are be very similar and both have the same root node, in which cases this can be quite easy. Otherwise, xOpera must be able to abandon the job and issue a warning that this diff is to complex or it cannot guarantee that there exists only one unique transform.

Second, we need to support "targeted undeploy" ->an action that undeploys only specific nodes and relationships. Note that this functionality might be useful also for manual intervention on the deployed project (e.g. user would undeploy one node without a change in the corresponding blueprint (CSAR)). Which means that actual deployment would not reflect the terms stated in the CSAR. This is something that could be a bit strange as DI is not the same to the DB anymore. Should we handle this and how should we handle this?

Afterwards, I would suggest to proceed with small steps and constantly check if we support user stories and don't over-complicate.

What do you think, @anzoman , @sstanovnik ?

dradX commented 3 years ago

Hi @cankarm - thanks for your inputs and suggestions. I agree to proceed and split this in two separate issues targeting compare and redeploy command separately. As you correctly noticed most of the preparation for redeployment will be actually implemented in the compare command, therefore it is also good to have a separate issue for it. We would like to point this two issues to this overarching issue to bring more context for the user, nevertheless.

With regards to the second part of the comments - regarding the abandoning of the comparison if the blueprints differ to much - the calculation of when to quit the comparisons is equally complex as calculating the diffs - since this can be done only after the complete traversal of two DAGs and calculating the diffs. Even more or equally hard, could be adding any rule about how much diff is to much. Therefore we plan to keep it really simple for now and just try to produce the diffs.

At this point we should also provide a checkpoint on redeploy if any changes in the state of the deployed instance happened between issuing the compare and redeploy commands - such as changes created by adhering to TOSCA policy rules that might introduce state changes, for instance. In this case we would need to warn the user that state changed and the calculation is invalid.

As for the second part of the suggestion regarding partial changes - targeted undeploy - we would like to keep the workflow and commands (state) as consistent as possible with the current opera deploy/undeploy workflows, an therefore asking the user to produce a full new definition of the blueprint. We feel that this is also more coherent with the declarative TOSCA CSAR self-encapsulated blueprint approach. We might address the partial undeploy option in a later separate issue having in mind the full consistency of this approach.

Anyhow I completely agree we should not over-complicate things and try to implement this variant and then introduce more complexity, if needed. Also any inputs from @matejart, @anzoman and @sstanovnik are more than welcome.

cankarm commented 3 years ago

Let's go through the points

  1. diffs -> sure, do it and we will see how it goes.
  2. about compare and redeploy I'm not sure that I get it. It seems as the redeploy should be able to detect if a new diff is not valid anymore?
  3. Regarding targeted undeploy.... I understand that you will create a new blueprint and detect what you will remove. But how exactly you will get rid of the instances that should be removed? Which command will you run and what will this execute then. I expect that the"downscale" procedure could be much more complicated then delete one. It's not simply delete the instance but it might include re-route the traffic or push all your data in a global database before you delete yourself
dradX commented 3 years ago

@cankarm Since we agree on the first point I will start with the second point.

  1. Yes, you understood the issue, but I will try to explain a possible case again. When a user issues the compare command opera executes diffs between the reconfigured set of inputs/blueprint and the current state of the model stored in .opera. Since this state can change because of a policy triggered event or any other reason, before the user calls redeploy, we should check if the state of the model is the same as when the user executed compare. If the state of the model has changed we should warn the user that a new diff calculation is needed since state has changed.

  2. What should happen on node delete is completely and entirely in the domain of the blueprint definition of the deployed instance and the same goes for nodes added in the reconfigured blueprint, if any. The orchestrator is not the one that should amend the or invent the process of reconfiguration. We can pose the same question about "what happens with this node on undeploy". If the specific node in the blueprint has any configure interface this should be called as when executing delete. I will take the example you provided:

    It's not simply delete the instance but it might include re-route the traffic or push all your data in a global database before you delete yourself

If a node in the deployed blueprint needs to store its data somewhere before deletion this should be part of the node configuration (interface operation of the node) in the blueprint - no mater, if this blueprint will ever be reconfigured or not. The same goes for the re-routing, for instance, when adding nodes to a load-balancer. It should be able to configure itself and account for the nodes added/deleted if they are properly added and configured in the blueprint. In this case we are also counting on the immutability of the executed playbooks, since there is nothing else opera can do. In any case if the user is not satisfied with the results of compare he/she can always abort redeployment by never calling redeploy. We might also add a command to remove the .compare directory in this case, if needed. To be completely clear, we want to start with a simple redeployment and then check what can be done and to what degree, if we want to implement a smarter compare .

matejart commented 3 years ago

To me, the original description makes sense. Personally, I'd have an opera diff command, which outputs the difference between the state model and my updated service template + inputs. And then I'd have opera update, which computes the diff (regardless of whether I issued opera diff first or not) and applies it. I'm not sure if any additional files being stored in the deployment state are needed (sure, the internal representation of what it needs to do will be a derived thing, not a direct representation of the incoming service template), I would just treat the update as an in-line partial undeploy and partial deploy. But admittedly, this is all an implementation detail.

And in general, if an instance of a node templates needs to change due to changed properties, the easier thing to do is to delete it and to create a new one. But perhaps there'll be a capability of doing in-place change for particular node types that are capable of doing that.

Just a nitpick, but a name "redeploy" suggests to me I either deploy everything from scratch, throwing away the existing deployment, or create a new instance. But to "update" it, it suggests a more targeted change.

cankarm commented 3 years ago

@dradX probably we will need a call.

  1. This change triggered by the policy is something that I would leave for the future. I would not focus on that now as it prematurely complicates things. First a diff and then an update. The reason is that you might be unable to make a change if your application would be in live-lock with constant scaling up and down or just moving some nodes somewhere.

  2. I totally agree with you about the cover story. My issue here is how to tackle the lifecycle operations: image

Where you will put your redeploy (and I agree it should be update)? Node undeploy can be delete, but in the case of update, could have different workflow. We need to tackle this. I'm perfectly aware that this is on the user's side to create and put in the operation delete, but we need to find a standard way to do that. You will need to use some parameter to pass around, so delete will know if it is a simple destroy everything or just a scale-down.

dradX commented 3 years ago

@matejart thanks for this suggestions and expressed views. I would agree with you on almost all accounts - especially regrading the use of update - it perfectly fits what we are trying to do instead of redeploy. I would suggest we keep the compare command as diff is not a commonly used verb - although it is used often for this purpose. As you correctly noticed we would need to recalculate the changes once the user issues update. The reason why I would really stick to the workflow diff -> update is because doing it this way we show the user what will be done if/when we apply update. In this way the user explicitly (through the workflow) confirms the changes he/she wants to be applied to the deployed instance - and since there is no defined redeploy workflow in TOSCA we can follow in this case we eliminate any questions with regards to what will be done when executing update.

@cankarm we may have a call

  1. I would not just put the policy triggered changes out of the picture since we started implementing it in opera and it is part of the TOSCA 1.3 policy types definition. This is a feature we need to have in mind and the proposed workflow could cover it with a simple --force/-f switch on update.
  2. We tackle the life-cycle operations for the a specific node missing in the new version of the blueprint, as said with the interfaces we have defined by the deployed blueprint (so executing node delete). The usual workflow is defined in TOSCA as stop/delete if defined - since there is no redeploy workflow in TOSCA that might cover this. 5.8.4.4.2 Normal node shutdown sequence diagram image
cankarm commented 3 years ago

I prefer diff over compare as it seems more used - but I might be wrong. At least from my side we always had diff and update mentioned in issues. in the end, it might not be so important.

@dradX implement proposed things and please don't do it in bulk. That's all that I'm trying to achieve by not paying attention to policy triggers at making diffs and updates. And use a realistic example.

dradX commented 3 years ago

Thank you @cankarm and @matejart we will proceed with the implementation as agreed first by adding two separate issues covering commands diff and update and trying to take into account the maximum usability from the user perspective.

anzoman commented 3 years ago

@dradX thanks for the detailed explanation regarding this feature. I agree on almost everything that was discussed before. So, just to sum up my thoughts:

So, to conclude, I am looking forward to your comments and of course the upcoming issues (and PRs).

alexmaslenn commented 3 years ago

Thanks @anzoman

alexmaslenn commented 3 years ago

@cankarm for the sake of having custom workflows for node updates we may later introduce a new TOSCA interface derived from tosca.interfaces.node.lifecycle.Standard with a new update operation. If this operation is present in the node, it is executed for updating, otherwise xOpera goes with stop -> delete -> create -> configure -> start as @dradX mentioned.

cankarm commented 3 years ago

One thing that would be nice to clarify is, what will be the input for opera update. Probably it would be ok if it is a blueprint or in some cases, only a diff. At least intrinsically, opera will need only a diff to be instantiated.

All my fuss about the targeted undeploy is focusing on two steps:

Why I'm mentioning this? As we faced this update problem already in case of scaling. For scale-up, the update is executed as a set of create operations, and currently, you can achieve this with clean state + deploy and you are done. The scale-down is more problematic as it cannot be done by opera undeploy with some tricks. This is an opera update operation that will execute some delete operations and also some create operations and occasionally also some reconfigures to fix what might have been ruined with the delete.

The reason, why we used the term targeted undeploy is because we required to name something that has similar undeploy capabilities as clean state + deploy for deploy.

alexmaslenn commented 3 years ago

@cankarm I think the input for update would be always blueprint + inputs, as diff only would not be enough in most cases. Opera would need to instantiate not only the nodes that are changed but also the nodes that stay intact, but serve as hosts for changed nodes. Basically the idea is to instantiate the whole blueprint graph, traverse it and skip the nodes that do not require any action.

cankarm commented 3 years ago

@alexmaslenn, but if the node has to be changed, then it is also described in diff. Am I right?

alexmaslenn commented 3 years ago

@cankarm imagine the scenario, when we have a VM with Docker engine and one Docker container as DI1 blueprint. DI2 blueprint simply introduces another Docker container to the existing instance model. So the diff result would be something like:

added:
   - container2

But in order to properly deploy the new model, opera would need information about the VM and Docker engine, hence instantiate them in update process.

cankarm commented 3 years ago

@alexmaslenn are we talking about the text diff or diff of two blueprints? I assume that diff would actually include all the nodes that has been touched, not just the lines. I understand your point, but I started from a different angle. So I thought that in diff you would omit only the parts that there is no change at all in the whole "tree". As this would be an input for an orchestrator to do something.

alexmaslenn commented 3 years ago

@cankarm then it is not clear for me, how the diff would look like, so it can both serve as an input for opera update command and look like a meaningful diff.

cankarm commented 3 years ago

@alexmaslenn - me neither. I understand that is very convenient that you see only the changes and this must be done for visual presentation, but I'm not sure if it is enough for execution. For example, in the case of removing nodes, you will provide a new blueprint with less content. How you will determine which nodes will be removed? You will need to create an "executable diff" and then execute it. Or there is another way?

With an update process of removing nodes, it might also be the case that you would need (or be able) to add a custom delete implementation (remove operation)* that might resolve some of our delete problems above, but this should be then done with adding some additional input, not only a new blueprint.

alexmaslenn commented 3 years ago

@cankarm well the idea is that diff command would produce 2 separate results:

  1. a human readable presentation of the differences in JSON/YAML formats
  2. an internal representation of the differences that can be used in update operation

When opera diff command is executed, the first result would be the output and the second one basically goes nowhere. When opera update is executed the second result serves as model for applying the desired changes, and the first one can be logged to console in --verbose mode.

For the example discussed above the first would be something like:

diff:
  added:
    - container2

  changed:

  deleted:  

and the second one would consist if 2 graphs (obviously with more info, types, properties, attributes, etc):

nodes_undeploy:
  vm1:
    state: initial #would not be undeployed

  docker_engine1:
    state: initial #would not be undeployed
    host: vm1

  container1:
    state: initial #would not be undeployed
    host: docker_engine1
nodes_deploy:
  vm1:
    state: started #would not be deployed

  docker_engine1:
    state: started #would not be deployed
    host: vm1

  container1:
    state: started #would not be deployed
    host: docker_engine1

  container2:
    state: initial #would be deployed
    host: docker_engine1

'started' and 'initial' are internal xOpera node instance states that indicate whether this node is already deployed and undeployed respectively link Opera would use the second representation to instantiate graphs and proceed as following:

  1. run opera undeploy for the first graph (in the example above would do nothing)
  2. run opera deploy for the second graph (in the example above would deploy container2)

This is definitely not the final design but rather the concept to start with.

UPD: Made clarifications for second representation because it was confusing as @cankarm rightly noted

cankarm commented 3 years ago

@alexmaslenn thanks for that explanation. So the result is that you will not touch anything from the first graph (as the initial would be ignored for undeploy) and in the second one you would ignore all started and deploy only initial as this is the way the deploy works.

It seems promising and I like it except the confusion that can result from the state names.

cankarm commented 3 years ago

@alexmaslenn I put together a diagram, that might help to understand all steps. Please review it and comment. We can update with your ideas.

xOpera-diff-update

alexmaslenn commented 3 years ago

@cankarm the implementation of this enhancement would be based on functionality that xOpera currently posses. As far as I'm aware there is no part of xOpera that allows monitor or get feedback from infrastructure once it is deployed. So Q2 answer is - diffs are made towards internal representation of the state as it is the only info currently available to xOpera.

As for Q1, if Policy Trigger makes changes in internal representation as shown on the diagram, then these changes would be reflected in diff operation with Day 1 blueprint.

anzoman commented 3 years ago

Now we have the first version of opera diff (introduced with #147) The dev opera release that includes this new CLI command is already available on Test PyPI instance.

anzoman commented 3 years ago

Good news - both opera diff and opera update commands are now available within the latest opera pre-release on Test PyPI here: https://test.pypi.org/project/opera/0.6.4.dev8/.

cankarm commented 3 years ago

Great!

If anyone can provide the documentation and example, that will be nice. A separate branch and then we merge it to main-docs branch.

anzoman commented 3 years ago

The documentation for both commands will be added with #172. I think that we realized all the plans considering this loong issue so I'm closing it now.