Open MatiasVara opened 2 years ago
The code here was merged into xapi-project:xen-api We can set up a session this week, probably with @edwintorok to speak about smapi v3 and future devekopment
I set up a design session already if you would like to talk about this topic.
Hello everyone,
in this issue, I would like to share my experience in writing a volume plugin for zfs filesystem for the smapiv3. In this implementation, the volume plugin for zfs represents Storage Repositories (SR) as pools and Volumes as zfs volumes. A zfs volume is a dataset that represents a block device and is created in the context of a filesystem. When creating a volume, it can be accessed as a raw block device, e.g.,
/dev/zvol/pepe
. Zfs supports operations over volumes like snapshot(), clone() or promote(). This simplifies the driver, but, in some cases, the actual implementation becomes a bit tricky because the XAPI does some operations in a certain order that can't be done in a zfs filesystem in the same order. I though this PoC could be interesting to understand better why those tasks are tricky. This is a summary and there are still many things that I am not sure about. First of all, thexe sr-create
command ends up invokingzpool create [name]
to create a pool. The only required parameter is the block devices in which the zfs fs will be installed. Thexe vdi-create
ends up invokingzfs create -V [pool] [name] [size]
. This creates a new volume in the pool. The volume can be accessed as a raw block device at/dev/zvol/[name]
In zfs, snapshots are taken by relying on the
zfs snapshot [volume]
command. When a new snapshot is created, the newvolume.id
is used as the name for the snapshot. Snapshots are read-only volumes. To access the snapshot, we have to either mount it or create a clone from the snapshot. Otherwise, the snapshot is not accessible. The current PoC always creates a clone when a snapshot is taken. The clone is named as the snapshot but it belongs to the pool where is created. You can see this in the following output. When the snapshot @2 is created the cloned volume 2 is created too.This cloned volume is important when issuing
xe vm-copy
, i.e., create-vm-from-snapshot, since the command requires accessing the volume to copy the content for the new VM's VDI. This is not possible if the volume is a snapshot. Another example is the commandxe snapshot-revert
. This command reverts the state of a VM from a snapshot. The first step tries to destroy the current VDI. However, this is not possible since the current VDI has children, e.g., the snapshot. The correct way to do it is to directly clone from the snapshot, promote the new volume and finally destroy the main VDI. The current PoC only accepts snapshot for the clone method and it worked around to destroy the parent VDI just after the new volume is promoted. The current implementation ofxe snapshot-revert
is as follows:Note that this works only if reverting is from the latest snapshot. Otherwise, the main volume can't be destroyed because there are still newer snapshots that can't be promoted to the new clone. The current implementation of volume destroy relies on the
zfs destroy
command. The method checks if the volume is a snapshot or a volume. If it is a snapshot, the method first builds the correct path to the snapshot, and then, destroys it. The method also checks if there is a clone with the same name and destroys it. This aims at removing the clone that the snapshot command creates. Note that when trying to destroy a volume with children, thezfs destroy
fails but the vdi-destroy success. This is an overall summary of the current PoC. I may be missing some chunks. I may release a design document soon that explains all the implementation details and decisions.