How to handle Roboconf migrations?

vincent-zurczak commented 8 years ago

Assuming Roboconf is used to deploy a production environment, how should we deal with the upgrade of Roboconf? Currently, the DM can be stopped and restarted (its state is persisted). And once #519 is solved, this would be quite transparent. However, it is not the same at all for agents, since they do not persist anything (or not everything, in fact).

We should even wonder whether it is a good practice or not to upgrade the agent (and any Software) on virtual machines. Maybe we should just create a new one with the right version. This may also be the solution to applications upgrade. Maybe the best solution for this is immutable infrastructures.

cdeneux commented 8 years ago

I agree with you. Immutable infrastructures are more secured.

IMO, the only constraint I see is such a immutable infrastructure must be a clustered infrastructure to be able to do a hot-upgrade without service interruption. Is it always possible ? I don't known. This should depend on the application to deploy. Perhaps Roboconf could provide some new features about load-balancing to create clustered applications more easily. For example, Roboconf could provide something as an application template "load-balancer" automatically instantiated and linked (as another Roboconf application) on a given keyword in the Roboconf application to deploy.

Another way could be based on system packages with an automatic upgrade configured at operating system level, but I'm not sure that is available on all operating systems.

vincent-zurczak commented 8 years ago

Migrating the DM is already feasible. It is part of its contract for quite a long time now. See installation tips.

There are many options for agents.
We can either want to migrate an agent without modifying running Software on the machine. That can be seen as a manual process. This implies the agent to persist its configuration, stop the agent, install the new one and start it. No matter what we decide, I think we should support this scenario in some way.

We can also want to migrate a whole application.
IMO, this should go through immutable infrastructures. Which means launching a new VM to migrate an agent. Migrating a whole application would imply creating all the VMs. The question is about service interruption and how many resources do we need. No matter what, when we migrate to a new Roboconf version, we should stop usual processes and go through Karaf commands.

Basically, it would mean...

# Log into the DM
./client -u karaf

# Declare we want to migrate an agent or a whole application.
roboconf:migration initialize my-app

# Actions from the web console are not possible anymore.
# No more life cycle change until the migration is done.
# True on the DM's side but also on agents' side.

# Migrate an agent...
roboconf:migrate perform my-app my-instance

# ... or migrate the whole application.
roboconf:migrate perform my-app

# Migration is over.
roboconf:migration complete my-app

roboconf:migration initialize my-app indicates to the DM no more action can be undertaken on this application.

roboconf:migration complete my-app indicates to the DM we can go back to the normal behavior.

roboconf:migrate perform... is the most complex because there are many situations.
I think we should make simple. I once listed migration strategies. Migrating Roboconf or a Roboconf application can be solved the same way for some situations. IMO, roboconf:migrate perform... should ask the chosen migration strategy. There will be only one now: #258. We undeploy the application and we will install the new one then. Once the strategy is chosen, roboconf:migrate perform... should ask whether we need to upgrade the application template. If application A was associated with template TA 1.0, and that TA 1.1 is now available, we will create the same application but from the new template. So, basically, we should select the application template we want to migrate to. It can be any version (forward or backward). Obviouskly, we can remain on the same version. Eventually, roboconf:migrate perform... should ask a confirmation.

In case where a user would change his mind, and that he/she types in Ctrl + C, roboconf:migration cancel my-app should be available. Basically, it would do the same thing the roboconf:migration complete my-app. Only the semantic changes. It is not possible to cancel the process when perform is in progress.

All the process is described as being interactive. However, Karaf allows to script such actions. So, they could just be scripted to prevent errors in productions.

Other migration strategies were submitted: #259 (quite easy to implement) and #260 (more difficult). And others can be found.

vincent-zurczak commented 8 years ago

With the second scenario, migration is handled by the DM. So, upgrading the DM makes this option available to new Roboconf versions. Which means we do not have to implement it for the next release (0.7).

I created issue #607 for the first scenario (manual switch).

roboconf / roboconf-platform

How to handle Roboconf migrations? #561