yisun-git commented 5 years ago

Crate Name

vm-live-migration

Short Description

vm-live-migration crate abstracts the components used for live migration, provides the interfaces to control the live migration and defines the flows of live migration.

The upper layer which can respond to control requests calls vm-live-migration interface to start the live migration. The user input configuration is passed as parameters.

Live migration is a complex process. Many methods/components have been invented to help to improve the migration. At the beginning phase, only are the methods/ components required to make basic live migration work implemented.

Why is this crate relevant to the rust-vmm project?

Live migration is one the major benefits of virtualization to provide flexible management to virtual machine (VM). For energy management, load balancing, server maintenance and fault tolerance reasons in cloud computing, there is strong demand to migrate VM from one host machine to another with minimal downtime. Live migration is the technology to fulfill the requirement. It can move the running VM from one physical machine to another without disconnecting the applications. It already is a must have feature in VMM.

Design

Overview

There are source VM and target VM to do live migration. The flows of them are different. So we need define a MigrationSender struct and a MigrationReceiver struct to handle different works. But some common works can be abstracted to MigrationBase struct which is included by both MigrationSender and MigrationReceiver.

Ram is the most data to be transferred during live migration. Its flow is somewhat complex compared with others. So MigrationRam struct is defined to handle ram data migration.

Besides ram data, the VM states (including cpu, devices, migration configuration and so on) should be migrated too. So MigrationState struct is defined to provide interfaces to outside devices/components to register states need to do live migration.

Both source side and target side should follow a fixed data format to recognize the transferring data. So MigrationDataFile struct is defined to provide interfaces to handle the data as the fixed format. When data is ready or achieves threshold on source side, MigrationDataFile calls MigrationTransport interface to send data to target. On target, when data is received, it is parsed as the format defined by MigrationDataFile.

Live migration may base on different protocols, e.g. TCP, RDMA and so on. So we need abstract a MigrationTransport trait to cover all common requirements for data transportation. MigrationTCP struct is implemented as the concrete implementation for MigrationTransport trait. Only is TCP planned to be supported at this beginning phase.

Conclude above components relationships as below figures.

When source and target VMs start, the MigrationState instances are registered into MigrationStateArray.

When migration starts, the work flow of all components.

MigrationTransport

MigrationTransport trait defines the transportation protocols and provides related interfaces to establish connection, handling packages and so on. At current phase, only is TCP supported as a concrete implementation.

pub trait MigrationTransport {
    pub fn bind(...) -> bool;
    pub fn connect(...) -> bool;
    pub fn shutdown(...) -> bool;
    pub fn write(...) -> i32;
    pub fn read(...) -> i32;
    ...
}

MigrationTCP

MigrationTCP is the concrete implementation for MigrationTransport trait. It manages TCP connection, sends/receives migration data. It interacts with mio crate to call basic functions.

When migration starts, MigrationTCP instance is created if the configured protocol is TCP. It is saved as a MigrationTransport instance in upper layer.

pub struct MigrationTCP {
    addr: String,
    sock: TcpStream,
    ...
}

impl MigrationTCP {
    pub fn new(addr: String, ...) -> MigrationTCP;
    ...
}

impl MigrationTransport for MigrationTCP {
    ...
}

MigrationDataFile

MigrationDataFile defines the mechanism to handle the data. It provides a series of interfaces to put/get the data from buffer.

pub struct MigrationDataFile {
    trans: Box<MigrationTransport>,
    buf_pos: u32,
    buf_size: u32,
    buf: [u8; 32768],
    iovcnt: u32,
    /* Part of buf[] is saved into one iov[] member */
    iov: [IoVec; 64],
}

impl MigrationDataFile {
    pub fn new(trans: Box<MigrationTransport>, ...) -> MigrationDataFile;
    pub fn put_byte(&self, v: u32);
    pub fn put_be16(&self, v: u32);
    pub fn put_be32(&self, v: u32);
    pub fn put_be64(&self, v: u64);
    pub fn put_buffer(&self, buf: &mut [u8], size: usize);
    pub fn get_byte(&self) -> u32;
    pub fn get_be16(&self) -> u32;
    pub fn get_be32(&self) -> u32;
    pub fn get_be64(&self) -> u64;
    pub fn get_buffer(&self, buf: &mut [u8], size: usize);
    /* Call MigrationTransport interface to send data to target */
    pub fn flush_data(...);
    ...
}

MigrationDataFileFlag

To make source VM and target VM identify the content of transferring data, we should define some data flags to mark what the following data is. Considering the possibility to make live migration work between rust-vmm and Qemu, we follow Qemu transferring data format.

So below flags are defined. They will be inserted into MigrationDataFile as data header/footer or flag. The migration flow implementation will do this work.

pub const MIGRATION_VM_FILE_MAGIC: u32          = 0x5145564d;
pub const MIGRATION_VM_FILE_VERSION_COMPAT: u32 = 0x00000002;
pub const MIGRATION_VM_FILE_VERSION: u32        = 0x00000003;

pub const MIGRATION_VM_EOF: u8                  = 0x00;
pub const MIGRATION_VM_SECTION_START: u8        = 0x01;
pub const MIGRATION_VM_SECTION_PART: u8         = 0x02;
pub const MIGRATION_VM_SECTION_END: u8          = 0x03;
pub const MIGRATION_VM_SECTION_FULL: u8         = 0x04;
pub const MIGRATION_VM_SUBSECTION: u8           = 0x05;
pub const MIGRATION_VM_VMDESCRIPTION: u8        = 0x06;
pub const MIGRATION_VM_CONFIGURATION: u8        = 0x07;
pub const MIGRATION_VM_COMMAND: u8              = 0x08;
pub const MIGRATION_VM_SECTION_FOOTER: u8       = 0x7e;

pub const MIGRATION_RAM_SAVE_FLAG_ZERO: u8      = 0x02;
pub const MIGRATION_RAM_SAVE_FLAG_MEM_SIZE: u8  = 0x04;
pub const MIGRATION_RAM_SAVE_FLAG_PAGE: u8      = 0x08;
pub const MIGRATION_RAM_SAVE_FLAG_EOS: u8       = 0x10;
pub const MIGRATION_RAM_SAVE_FLAG_CONTINUE: u8  = 0x20;

MigrationRam

Ram is the most data to be transferred during live migration. Its flow is somewhat complex compared with others. It needs to interact with memory management component to get dirty pages, send and restore them. To reduce the time and size to transfer the ram data, there are many methods invented. So we make ram migration handling a standalone component. If time allowed, the compression method may be adopted as one of the features of the first development phase.

Here is a prerequisite that memory management component should provide interfaces to get dirty memory information.

pub struct DirtyRegion {
    /* The dirty memory region */
    region: MmapRegion,
    /* Dirty bitmap to record which page is dirty in this memory block */
    bit_map: Vec<u64>,
    num_dirty_pages: u64,
}

pub struct MigrationRam {
    mig_data: MigrationDataFile,
    /* Is Arc necessary? */
    dirty_regions: Arc<Vec<DirtyRegion>>,
    last_sent_region: DirtyRegion,
    ...
}

impl MigrationRam {
    pub fn new(...) -> MigrationRam;
    pub fn migrate_ram_send_prepare(...);
    pub fn migrate_ram_receive_prepare(...);
    /* Call MigrationDataFile interface to insert dirty ram */
    pub fn migrate_ram_send(...);
    /* Call MigrationDataFile interface to get dirty ram */
    pub fn migrate_ram_receive(...);
    pub fn migrate_ram_iterate(...);
    pub fn migrate_dirty_bitmap_sync(...);
    pub fn migrate_find_dirty_region(...);
    ...
}

MigrationState

MigrationState defines the data structure used by any component which needs to migrate its states, e.g. cpu/virtio-device. The MigrationStateField defines the format of attribute to be migrated. Caller should set the attribute type so that we can know how to parse it. The value itself is converted to u8 vector to avoid complex implementation because the unknow type of value. Furthermore, caller should register save_fn/load_fn for its specific work.

When vmm starts on both source side and target side, its components that need to migrate state should call migration_state_push() to register its own MigrationState instance into global vector MigrationStateArray.

When live migration starts, source side migration process iterates MigrationStateArray to save states into MigrationDataFile buffer. When target side receives the MigrationDataFile, it parses out the state section and iterates MigrationStateArray to find the matched MigrationState instance according to idstr and instance_id. Then, it set the content of MigrationDataFile into MigrationState/MigrationStateField as sequence. At last, call load_fn callback function to ask the component load the migrated state value.

/* Field types */
const FIELD_TYPE_BYTE   = 1 << 1;
const FIELD_TYPE_BOOL   = 1 << 2;
const FIELD_TYPE_U16    = 1 << 3;
const FIELD_TYPE_U32    = 1 << 4;
const FIELD_TYPE_U64    = 1 << 5;
/* E.g. U16 array can be "FIELD_TYPE_U16 | FIELD_TYPE_ARRAY" */
const FIELD_TYPE_ARRAY  = 1 << 6;
...

pub struct MigrationStateField {
    value: Vec<u8>,
    version: i32,
    /* According to type, call corresponding MigrationDataFile interface to handle data */
    type: u32,
    size: u32,
    /* Number of array members */
    num_array: u32,
}

/* Corresponding to Qemu "SaveStateEntry + VMStateDescription" */
pub struct MigrationState {
    /* The identifier to match MigrationDataFile with the correct handler */
    idstr: String,
    instance_id: i32,
    version: i32,
    fields: Vec<MigrationStateField>,
    /*
     * Devices should register their save/load callback functions when create
     * MigrationState instance. They are used for device specific purpose.
     */
    save_fn: Box<FnMut()>,
    load_fn: Box<FnMut()>,
}

impl MigrationState {
    pub fn new(...) -> MigrationState {...};
    pub fn send_state(...) {...};
    pub fn receive_state(...) {...};
    ...
}

lazy_static! {
    static ref MigrationStateArray: Mutex<Vec<MigrationState>> = Mutex::new(vec![]);
}

pub fn migration_state_push(state: MigrationState, ...) {
    MigrationStateArray.lock().unwrap().push(state);
    ...
}

pub fn migration_state_pop(...) -> MigrationState {
    ...
    MigrationStateArray.lock().unwrap().pop()
}

MigrationBase

MigrationBase struct is the base class to manage migration status and provides the interfaces to control migration process.

MigrationStatus is used to manage live migration status.

pub enum MigrationStatus {
    Migration_Status_Launch,
    Migration_Status_Init,
    Migration_Status_Active,
    Migration_Status_Cancel,
    Migration_Status_Complete,
    Migration_Status_Fail,
}

pub enum MigrationProtocol {
    Migration_Protocol_TCP,
    ...
}

pub struct MigrationBase {
    status: MigrationStatus,
    protocol: MigrationProtocol,
    data: MigrationDataFile,
    ram: MigrationRam,
    trans_thread: thread::JoinHandle<()>,
    ...
}

impl MigrationBase {
    pub fn new() -> Result<Self> {...};
    pub fn set_status() -> Result<()> {...};
    pub fn start_transport_thread() -> thread::JoinHandle<()> {...};
    ...
}

MigrationSender

MigrationSender struct includes Migration struct object and implements the sender specific data and operations.

pub struct MigrationSender {
    mig: MigrationBase,
    ...
}

impl MigrationSender {
    pub fn new() -> Result<Self> {...};
    pub fn start() -> Result<()> {...};
    pub fn cancel() -> Result<()> {...};
    pub fn send_configure() -> Result<()> {...};
    pub fn migrate_iterate() -> Result<()> {...};
    ...
}

Above functions are used to manage live migration on source side.

start(): When user input command on source side to start live migration process, this function is called. The main flow of live migration is below. Stage 1: Mark all VM RAM as dirty to transfer all memory to destination to initialize the target VM. Stage 2: Keep sending dirty RAM pages since last iteration until a convergence criteria is fulfilled. Stage 3: Suspend VM and transfer remaining dirty RAM, CPU states and device states to destination. Furthermore, redirect source VM network traffic to target VM.
cancel(): When user input command to cancel live migration process, this function is called.

MigrationReceiver

MigrationReceiver struct includes MigrationBase struct object and implements the receiver specific data and operations.

pub struct MigrationReceiver {
    mig: MigrationBase,
    ...
}

impl MigrationReceiver {
    pub fn new() -> Result<Self> {...};
    pub fn apply_config() -> Result<()> {...};
    pub fn receive_loop() -> Result<()> {...};
    ...
}

Above functions are used to manage live migration on target side.

receive_loop(): When user input command on target side to start live migration process, MigrationReceiver instance is created to do some preparations. Then, this function is called to wait for source side message. When message comes, it is parsed and dispatched to different flows as the section header.
apply_config(): It is used to apply configuration data received from source side.

Note

This design may be changed according to good comments or better solutions found during implementation. May update it later. Thanks!

bonzini commented 5 years ago

This design vaguely :) reminds me of QEMU. While I think that QEMU's overall migration format is fine, what about transmitting the device's migration data using Serde? See for example https://crates.io/crates/serde_asn1_der (not sure how mature that crate is).

jiangliu commented 5 years ago

This design vaguely :) reminds me of QEMU. While I think that QEMU's overall migration format is fine, what about transmitting the device's migration data using Serde? See for example https://crates.io/crates/serde_asn1_der (not sure how mature that crate is). I prefer to use serde framework and avoid reinvent the wheel, but json is a little overhead:) Another question is whether or not to reuse qemu live migration wire protocol.

This design vaguely :) reminds me of QEMU. While I think that QEMU's overall migration format is fine, what about transmitting the device's migration data using Serde? See for example https://crates.io/crates/serde_asn1_der (not sure how mature that crate is).

I prefer to use the rust serde framework and avoid reinvent the wheel, but json is a little overhead:) Another question is whether or not to reuse qemu live migration wire protocol.

jiangliu commented 5 years ago

How about using rust tokio/futures + serde for the live migration?

andreeaflorescu commented 5 years ago

@jiangliu tokio has been a big pain point for Firecracker because it pulls in so many dependencies that are simply unmanageable. We ended up with a fixed version of tokio and still struggling to get rid of that dependency. I don't think we should re-invent the wheel, but maybe we should put some more thought into this.

andreeaflorescu commented 5 years ago

BTW, serde has support for multiple data formats, we don't necessarily need to use JSON if we end up using serde.

https://serde.rs/#data-formats

jiangliu commented 5 years ago

@jiangliu tokio has been a big pain point for Firecracker because it pulls in so many dependencies that are simply unmanageable. We ended up with a fixed version of tokio and still struggling to get rid of that dependency. I don't think we should re-invent the wheel, but maybe we should put some more thought into this.

Good point:) Seems that the future crate is lightweight, by the hyper crate is really heavyweight, tokio is moderate depending on the features used.

yisun-git commented 5 years ago

This design vaguely :) reminds me of QEMU. While I think that QEMU's overall migration format is fine, what about transmitting the device's migration data using Serde? See for example https://crates.io/crates/serde_asn1_der (not sure how mature that crate is).

I prefer to use the rust serde framework and avoid reinvent the wheel, but json is a little overhead:) Another question is whether or not to reuse qemu live migration wire protocol.

Hi, all, thanks for the comment! I am thinking to use serde to serialize/deserialize migration data. Per my thought, it should be a tool used by MigrationData to assemble the migration data according to the format defined by MigrationDataFileFlag. Then, we can keep the ability to do migration between Qemu and rust-vmm. How do you think?

jiangliu commented 5 years ago

This design vaguely :) reminds me of QEMU. While I think that QEMU's overall migration format is fine, what about transmitting the device's migration data using Serde? See for example https://crates.io/crates/serde_asn1_der (not sure how mature that crate is).

I prefer to use the rust serde framework and avoid reinvent the wheel, but json is a little overhead:) Another question is whether or not to reuse qemu live migration wire protocol.

Hi, all, thanks for the comment! I am thinking to use serde to serialize/deserialize migration data. Per my thought, it should be a tool used by MigrationData to assemble the migration data according to the format defined by MigrationDataFileFlag. Then, we can keep the ability to do migration between Qemu and rust-vmm. How do you think?

Have some concerns about the goal to migration vms between firecracker/cloudhypervisor and qemu. It may needs big effort or even impossible.

yisun-git commented 5 years ago

This design vaguely :) reminds me of QEMU. While I think that QEMU's overall migration format is fine, what about transmitting the device's migration data using Serde? See for example https://crates.io/crates/serde_asn1_der (not sure how mature that crate is).

I prefer to use the rust serde framework and avoid reinvent the wheel, but json is a little overhead:) Another question is whether or not to reuse qemu live migration wire protocol.

Hi, all, thanks for the comment! I am thinking to use serde to serialize/deserialize migration data. Per my thought, it should be a tool used by MigrationData to assemble the migration data according to the format defined by MigrationDataFileFlag. Then, we can keep the ability to do migration between Qemu and rust-vmm. How do you think?

Have some concerns about the goal to migration vms between firecracker/cloudhypervisor and qemu. It may needs big effort or even impossible.

Yes, maybe. In fact, I am not very sure about it either. @bonzini @andreeaflorescu, how about your opinions? Thanks!

yisun-git commented 5 years ago

@jiangliu tokio has been a big pain point for Firecracker because it pulls in so many dependencies that are simply unmanageable. We ended up with a fixed version of tokio and still struggling to get rid of that dependency. I don't think we should re-invent the wheel, but maybe we should put some more thought into this.

Good point:) Seems that the future crate is lightweight, by the hyper crate is really heavyweight, tokio is moderate depending on the features used.

How about fibers crate which bases on futures and mio to support asynchronous tasks? It looks more lightweight than tokio.

richardwyang commented 5 years ago

Hmm... I got one confusion. Why we plan to separate MigrationState and MigrationRam.

In case we leverage QEMU, we see RAM and other devices are registered on savevm_state as the same type SaveStateEntry. They are treated as the same during each iteration and completion stage, but with different handlers. This is how they are handled differently.

I agree that RAM is special and need some tricks to handle. While from higher level point of view, it could be the same as others. For implementation detail, it is reasonable to have unique behaviors.

In case I missed something, just let me know.

bonzini commented 5 years ago

Live migration between QEMU and rust-vmm is not really a practical goal. For any but the simplest devices, migration includes implementation details such as timers and device names that are going to differ too much.

yisun-git commented 5 years ago

Hmm... I got one confusion. Why we plan to separate MigrationState and MigrationRam.

In case we leverage QEMU, we see RAM and other devices are registered on savevm_state as the same type SaveStateEntry. They are treated as the same during each iteration and completion stage, but with different handlers. This is how they are handled differently.

I agree that RAM is special and need some tricks to handle. While from higher level point of view, it could be the same as others. For implementation detail, it is reasonable to have unique behaviors.

In case I missed something, just let me know.

Hi, Richard, because ram is really different with states, I do not want to mix them together to provide unified handlers. That may cause complexity which seems not necessary.

yisun-git commented 5 years ago

Live migration between QEMU and rust-vmm is not really a practical goal. For any but the simplest devices, migration includes implementation details such as timers that are going to differ too much.

Thank you! This is very important comment. So I may not consider migration between Qemu and rust-vmm. That may simplify things.

vmsearch commented 4 years ago

@yisun-git the migration is in progressing?

yisun-git commented 4 years ago

@yisun-git the migration is in progressing?

Yes, working on draft codes.

vmsearch commented 4 years ago

@yisun-git the migration is in progressing?

Yes, working on draft codes.

Thanks your reply, Looking forward to your good news!

sboeuf commented 4 years ago

@yisun-git this is a very complete proposal :)

Regarding the name of the crate, shouldn't this be vm-migration instead of vm-live-migration? Including live in the name sounds a bit reductive since I think we can achieve both regular and live migration with this crate.

A bit similar to my first comment, I'm wondering if we shouldn't start with regular migration instead of live migration. I know live migration is the end goal for production environment, but regular migration would allow us to focus on VM data and states which need to be saved. I might be missing something (since I have no experience in VM migration) but it feels like being able to take a proper snapshot of a VM and save it to some files, so that we could start another VM with those files as inputs would be the first step.

Adding the whole communication mechanism to perform the live migration seems like an orthogonal task and might be something that gets added later.

Comments more than welcome since I might be completely off track :)

andreeaflorescu commented 4 years ago

My honest opinion is that it's too early to plan for live-migration when we don't even have the minimal crates in rust-vmm to create a dummy VMM. I also believe that live-migration depends (or better said) should depend on the implementation of devices and other components in rust-vmm. At this moment we don't have a high level design of how future and current rust-vmm components will interact. We also don't have the basic components in rust-vmm like common virtio implementations.

Is the plan to have something merge-able in the near future? Or is the point of this issue to just discuss about how live-migration might be implemented by multiple VMMs using rust-vmm components? I think the second option could make sense since I am assuming that multiple parties are currently working on implementing live migration for their own version of Rust VMM, so people can share ideas. But if what we want to achieve is a vm-migration crate in rust-vmm in the following months before we have an idea of how rust-vmm will look in the future, I think that is not the best we can do right now for the future of this project.

yisun-git commented 4 years ago

Thanks for the comment!

@yisun-git this is a very complete proposal :)

Regarding the name of the crate, shouldn't this be vm-migration instead of vm-live-migration? Including live in the name sounds a bit reductive since I think we can achieve both regular and live migration with this crate.

A bit similar to my first comment, I'm wondering if we shouldn't start with regular migration instead of live migration. I know live migration is the end goal for production environment, but regular migration would allow us to focus on VM data and states which need to be saved. I might be missing something (since I have no experience in VM migration) but it feels like being able to take a proper snapshot of a VM and save it to some files, so that we could start another VM with those files as inputs would be the first step.

Yes, snapshot can be treated as special case of live migration. At least, most codes could be same.

Adding the whole communication mechanism to perform the live migration seems like an orthogonal task and might be something that gets added later.

Yes, communication layer is independent. Although I have implemented the basic communication codes, I will consider your suggestions. Thanks!

Comments more than welcome since I might be completely off track :)

yisun-git commented 4 years ago

My honest opinion is that it's too early to plan for live-migration when we don't even have the minimal crates in rust-vmm to create a dummy VMM. I also believe that live-migration depends (or better said) should depend on the implementation of devices and other components in rust-vmm. At this moment we don't have a high level design of how future and current rust-vmm components will interact. We also don't have the basic components in rust-vmm like common virtio implementations.

Live migration does depend on many components, e.g. the vcpu/devices/memory/etc. But rust-vmm does not have most of them now. Furthermore, rust-vmm does not have basic components now and miss a high level design. Shall we consider it and make a road-map ASAP? I think live migration should be a good start to trigger the consideration and discussion.

Is the plan to have something merge-able in the near future? Or is the point of this issue to just discuss about how live-migration might be implemented by multiple VMMs using rust-vmm components? I think the second option could make sense since I am assuming that multiple parties are currently working on implementing live migration for their own version of Rust VMM, so people can share ideas. But if what we want to achieve is a vm-migration crate in rust-vmm in the following months before we have an idea of how rust-vmm will look in the future, I think that is not the best we can do right now for the future of this project.

Because rust-vmm lacks many components, I do not plan to make the whole vm-migration crate merged in short time. But I want to share my ideas and even the draft codes (implemented on workable rust vmm) here to get helpful comments and suggestions. Furthermore, I think some basic mechanisms of migration may be merged if the codes are mature enough so that the users of rust-vmm can get benefit from it.

andreeaflorescu commented 4 years ago

My honest opinion is that it's too early to plan for live-migration when we don't even have the minimal crates in rust-vmm to create a dummy VMM. I also believe that live-migration depends (or better said) should depend on the implementation of devices and other components in rust-vmm. At this moment we don't have a high level design of how future and current rust-vmm components will interact. We also don't have the basic components in rust-vmm like common virtio implementations.

Live migration does depend on many components, e.g. the vcpu/devices/memory/etc. But rust-vmm does not have most of them now. Furthermore, rust-vmm does not have basic components now and miss a high level design. Shall we consider it and make a road-map ASAP? I think live migration should be a good start to trigger the consideration and discussion.

Yes, I think we should start working on the roadmap and the high level design. That should speed up the development process. It would also be a great way to prioritize the work and making sure we are focusing on the right components. I will send an email about this as it was on my mind for quite some time now.

rust-vmm / community

Crate Addition Request - vm-live-migration #67

Crate Name

Short Description

Why is this crate relevant to the rust-vmm project?

Design

Overview

MigrationTransport

MigrationTCP

MigrationDataFile

MigrationDataFileFlag

MigrationRam

MigrationState

MigrationBase

MigrationSender

MigrationReceiver

Note