subutai-io / agent

Subutai Agent is a tool that provides CLI to control Subutai infrastructure, and is a daemon that receives and performs Management commands through secured channels
https://subutai.io
15 stars 15 forks source link

Research on backup/restore of containers #954

Closed dilshat closed 5 years ago

dilshat commented 5 years ago
akarasulu commented 5 years ago

The output of this research spike should be the technical requirements and a plan as a milestone of issues to implement a facility for Subutai that allows for snapshotting / rolling back containers and dumping differentials between snapshots. Later on we can build on this to implement scheduled backup services for Subutai. These are the primitives to enable the more involved backup service, but for now users should be able to use these primitives via the PeerOS and via the Bazaar to manually back up and restore containers.

dilshat commented 5 years ago

To lay the ground, I''ll describe what we use now of zfs:

For template creation (export):

  1. Container is stopped to silence all operations on file system
  2. Fresh snapshots are taken of the containers' datasets
  3. Incremental snapshots are taken between the containers parent template's snapshots and the container's snapshots and sent to delta files

For template installation (import):

When a template is imported, the parent templates, containing needed snapshots, must be present (installed) and the delta files are used to bring the template to its actual state by receiving the deltas into the template's datasets

For container creation (cloning):

When cloning a container, the parent template's snapshots are cloned into the container's datasets. (The snapshots are installed during template installation)

dilshat commented 5 years ago

Filesystem of a container consists of 4 partitions: /root, /var, /opt and /home. Respectively, 4 snapshots are taken during template export along with accompanying configurations files and compressed into a template archive.

For container backups we need to take snapshots of all 4 partitions ( of the 4 corresponding zfs datasets ). The snapshots might reside in the system on zfs itself, or might be exported ( sent ) to a stream or file. Since container also has additional lxc configuration file which effects its operation, we need to include it into a backup too. Thus we arrive to the same approach we have with templates: we take incremental snapshots between parent template and container itself and dump the deltas to an archive with config file included.

Later on we can move this archive to a cloud storage if needed.

Dataset restoration requires the presence of all earlier snapshots in the chain and that the receiving side must be the same as snapshot starting point. This adds complexity for chaining backups. Instead of this, we better use the same approach ( described above ) as with templates and take snapshot between parent template and container current state. Also it depends, whether we want to store backups on the system itself or somewhere else. With the former approach we might not even need to create backup archives since snapshots are stored within zfs itself on the system. With the latter approach, we need to create archives as moveable units.

To restore a container from a backup we can take the following approach:

  1. If backups are to be stored locally, then we can simply use the builtin zfs rollback command to rollback a dataset to any earlier snapshot.
  2. This case is a bit more complicated. First we need to receive backup deltas to some temporary datasets. Then we need to stop the container to be restored. Then we need to replace the files (rm + mv) of the container with the files of temporary datasets Then we need to remove the temporary datasets Then we need to start the container. We have to replace the files instead of using zfs tools because to receive a snapshot using zfs tooling, the target dataset must be in the state that is the same as the starting point of incremental snapshot. In other words, container must have the same snapshot as its parent (which is not true in the current implementation to save space) and the container's data must not be changed afterwards. But since containers are clones not snapshots and have RW fs, these requirements can not be met.
akarasulu commented 5 years ago

This is trying to do backups. We just need to expose the ability to take a snapshot on one or more, or all partitions of the container and to do dumps. No need to get complex. We can leave that to higher levels.

dilshat commented 5 years ago

We should think what we can do with zfs and how because even simple steps like doing snapshots and dumping to backups need to have a reciprocal operation of restoring container out of these backups. Here low level matters and high not. We need to be able to see the whole picture