nutanix / libvfio-user

framework for emulating devices in userspace
BSD 3-Clause "New" or "Revised" License
162 stars 51 forks source link

live migration: skip identical state transitions #679

Open tmakatos opened 2 years ago

tmakatos commented 2 years ago

@jlevon noticed that during LM QEMU might tell us to switch to the same migration state, here's an example from NVMf/vfio-user:

[2022-05-11 16:27:35.610634] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: migration: transitioning from state pre-copy to state stop-and-copy
[2022-05-11 16:27:35.610644] vfio_user.c:3510:vfio_user_migration_device_state_transition: *DEBUG*: /var/run controller state 3, migration state 2
[2022-05-11 16:27:35.610694] vfio_user.c:3031:vfio_user_ctrlr_dump_migr_data: *NOTICE*: Dump SAVE
[2022-05-11 16:27:35.610732] vfio_user.c:3036:vfio_user_ctrlr_dump_migr_data: *NOTICE*: Registers
[2022-05-11 16:27:35.610749] vfio_user.c:3037:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CSTS 0x1
[2022-05-11 16:27:35.610766] vfio_user.c:3038:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CAP  0x201e0100ff
[2022-05-11 16:27:35.610782] vfio_user.c:3039:vfio_user_ctrlr_dump_migr_data: *NOTICE*: VS   0x10300
[2022-05-11 16:27:35.610808] vfio_user.c:3040:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CC   0x460001
[2022-05-11 16:27:35.610835] vfio_user.c:3041:vfio_user_ctrlr_dump_migr_data: *NOTICE*: AQA  0xff003f
[2022-05-11 16:27:35.610859] vfio_user.c:3042:vfio_user_ctrlr_dump_migr_data: *NOTICE*: ASQ  0xbffdd000
[2022-05-11 16:27:35.610878] vfio_user.c:3043:vfio_user_ctrlr_dump_migr_data: *NOTICE*: ACQ  0xbffde000
[2022-05-11 16:27:35.610892] vfio_user.c:3045:vfio_user_ctrlr_dump_migr_data: *NOTICE*: Number of IO Queues 1
[2022-05-11 16:27:35.610910] vfio_user.c:3059:vfio_user_ctrlr_dump_migr_data: *NOTICE*: sqid:0, bar0_doorbell:35
[2022-05-11 16:27:35.610934] vfio_user.c:3066:vfio_user_ctrlr_dump_migr_data: *NOTICE*: SQ sqid:0, cqid:0, sqhead:35, size:64, dma_addr:0xbffdd000
[2022-05-11 16:27:35.610952] vfio_user.c:3071:vfio_user_ctrlr_dump_migr_data: *NOTICE*: cqid:0, bar0_doorbell:35
[2022-05-11 16:27:35.610970] vfio_user.c:3078:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CQ cqid:0, phase:1, cqtail:35, size:256, iv:0, ien:1, dma_addr:0xbffde000
[2022-05-11 16:27:35.610986] vfio_user.c:3059:vfio_user_ctrlr_dump_migr_data: *NOTICE*: sqid:1, bar0_doorbell:1
[2022-05-11 16:27:35.611009] vfio_user.c:3066:vfio_user_ctrlr_dump_migr_data: *NOTICE*: SQ sqid:1, cqid:1, sqhead:1, size:256, dma_addr:0xbffd8000
[2022-05-11 16:27:35.611025] vfio_user.c:3071:vfio_user_ctrlr_dump_migr_data: *NOTICE*: cqid:1, bar0_doorbell:1
[2022-05-11 16:27:35.611040] vfio_user.c:3078:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CQ cqid:1, phase:1, cqtail:1, size:256, iv:0, ien:0, dma_addr:0xbffdc000
[2022-05-11 16:27:35.611059] vfio_user.c:3083:vfio_user_ctrlr_dump_migr_data: *NOTICE*: SAVE Dump Done
[2022-05-11 16:27:35.611078] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: migration: transitioned from state pre-copy to state stop-and-copy
[2022-05-11 16:27:35.611090] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: region9: wrote 0x2 to (0:4)
[2022-05-11 16:27:35.611106] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: device unquiesced
[2022-05-11 16:27:35.611116] vfio_user.c:2946:vfio_user_dev_quiesce_done: *DEBUG*: /var/run is in MIGRATION state
[2022-05-11 16:27:35.611568] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [(nil), 0xa0000), 0 dirty pages
[2022-05-11 16:27:35.612568] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [0xc0000, 0xcb000), 0 dirty pages
[2022-05-11 16:27:35.613571] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [0xcb000, 0xce000), 0 dirty pages
[2022-05-11 16:27:35.614565] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [0xce000, 0xe8000), 0 dirty pages
[2022-05-11 16:27:35.615567] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [0xe8000, 0xf0000), 0 dirty pages
[2022-05-11 16:27:35.616566] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [0xf0000, 0x100000), 0 dirty pages
[2022-05-11 16:27:35.618360] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [0x100000, 0xc0000000), 0 dirty pages
[2022-05-11 16:27:35.618821] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: dirty pages: get [0x100000000, 0x140000000), 0 dirty pages
[2022-05-11 16:27:35.619567] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: region9: read 0x2 from (0:4)
[2022-05-11 16:27:35.620565] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: quiescing device
[2022-05-11 16:27:35.620582] vfio_user.c:2979:vfio_user_dev_quiesce_cb: *DEBUG*: /var/run starts to quiesce
[2022-05-11 16:27:35.620591] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: device quiesced immediately
[2022-05-11 16:27:35.620602] vfio_user.c:2843:vfio_user_log: *DEBUG*: /var/run: migration: transitioning from state stop-and-copy to state stop-and-copy
[2022-05-11 16:27:35.620611] vfio_user.c:3510:vfio_user_migration_device_state_transition: *DEBUG*: /var/run controller state 5, migration state 2
[2022-05-11 16:27:35.620627] vfio_user.c:3031:vfio_user_ctrlr_dump_migr_data: *NOTICE*: Dump SAVE
[2022-05-11 16:27:35.620651] vfio_user.c:3036:vfio_user_ctrlr_dump_migr_data: *NOTICE*: Registers
[2022-05-11 16:27:35.620670] vfio_user.c:3037:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CSTS 0x1
[2022-05-11 16:27:35.620686] vfio_user.c:3038:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CAP  0x201e0100ff
[2022-05-11 16:27:35.620703] vfio_user.c:3039:vfio_user_ctrlr_dump_migr_data: *NOTICE*: VS   0x10300
[2022-05-11 16:27:35.620721] vfio_user.c:3040:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CC   0x460001
[2022-05-11 16:27:35.620738] vfio_user.c:3041:vfio_user_ctrlr_dump_migr_data: *NOTICE*: AQA  0xff003f
[2022-05-11 16:27:35.620753] vfio_user.c:3042:vfio_user_ctrlr_dump_migr_data: *NOTICE*: ASQ  0xbffdd000
[2022-05-11 16:27:35.620778] vfio_user.c:3043:vfio_user_ctrlr_dump_migr_data: *NOTICE*: ACQ  0xbffde000
[2022-05-11 16:27:35.620808] vfio_user.c:3045:vfio_user_ctrlr_dump_migr_data: *NOTICE*: Number of IO Queues 1
[2022-05-11 16:27:35.620835] vfio_user.c:3059:vfio_user_ctrlr_dump_migr_data: *NOTICE*: sqid:0, bar0_doorbell:35
[2022-05-11 16:27:35.620858] vfio_user.c:3066:vfio_user_ctrlr_dump_migr_data: *NOTICE*: SQ sqid:0, cqid:0, sqhead:35, size:64, dma_addr:0xbffdd000
[2022-05-11 16:27:35.620874] vfio_user.c:3071:vfio_user_ctrlr_dump_migr_data: *NOTICE*: cqid:0, bar0_doorbell:35
[2022-05-11 16:27:35.620892] vfio_user.c:3078:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CQ cqid:0, phase:1, cqtail:35, size:256, iv:0, ien:1, dma_addr:0xbffde000
[2022-05-11 16:27:35.620907] vfio_user.c:3059:vfio_user_ctrlr_dump_migr_data: *NOTICE*: sqid:1, bar0_doorbell:1
[2022-05-11 16:27:35.620931] vfio_user.c:3066:vfio_user_ctrlr_dump_migr_data: *NOTICE*: SQ sqid:1, cqid:1, sqhead:1, size:256, dma_addr:0xbffd8000
[2022-05-11 16:27:35.620954] vfio_user.c:3071:vfio_user_ctrlr_dump_migr_data: *NOTICE*: cqid:1, bar0_doorbell:1
[2022-05-11 16:27:35.620977] vfio_user.c:3078:vfio_user_ctrlr_dump_migr_data: *NOTICE*: CQ cqid:1, phase:1, cqtail:1, size:256, iv:0, ien:0, dma_addr:0xbffdc000

That's a bit weird, not sure why QEMU does this. @jraman567 @john-johnson-git any ideas? Irrespecively, we should skip this in libvfio-user.

jraman567 commented 2 years ago

@tmakatos I do recall receiving transitions from & to the same state - I skipped such requests in the server.