vexxhost / migratekit

Near-live migration toolkit for VMware to OpenStack
Apache License 2.0
60 stars 6 forks source link

Error: Get "http://169.254.169.254/openstack/latest/meta_data.json": dial tcp 169.254.169.254:80: i/o timeout #9

Closed LamNguy closed 1 day ago

LamNguy commented 1 month ago

Hi, I get this error but I can not know what is root cause, I run it from the bastion which has 2 interface, one can connect to VMware, one can connect to OpenStack cluster. [root@bastion vmware-vix-disklib]# docker run -it --rm --privileged -v /dev:/dev -v /usr/lib64/vmware-vix-disklib/:/usr/lib64/vmware-vix-disklib:ro --env-file <(env | grep OS_) registry.atmosphere.dev/library/migratekit:latest cutover --vmware-endpoint xxxx --vmware-username xxxxx --vmware-password xxxxxx --vmware-path /svtechhn/vm/cloudvm/lam.ndvm/lam.nd_bootstrap_kolla --flavor a8fde5f6-56c6-452b-b07d-a40b38141fff --network-mapping mac=00:50:56:8e:b2:73,network-id=05ba5267-e72d-4f08-9006-05f058ec8df4,subnet-id=b5f7c194-ebae-4759-b47c-4f593581be49,ip=10.1.30.245 Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. INFO[0000] Setting Disk Bus: virtio
INFO[0000] Ensuring OpenStack resources exist
INFO[0000] Flavor exists, ensuring network resources exist flavor=small INFO[0000] Port already exists port=263a9a1e-e097-41d0-a86a-5b3f974f0052 INFO[0000] Starting migration cycle
Creating snapshot 100% [======================================================================================================================================================================] (100/100) [0s:0s] DEBU[0001] Running command: /usr/sbin/nbdkit --exit-with-parent --readonly --foreground --unix=/tmp/migratekit-1716472452/nbdkit.sock --pidfile=/tmp/migratekit-1716472452/nbdkit.pid vddk server=xxxxxx user=xxxxxx password=xxxx thumbprint=FB:9D:25:5D:9C:2B:B2:F5:16:12:D5:3E:DA:36:A7:AE:67:CD:F3:2C compression=skipz vm=moref=vm-3121 snapshot=snapshot-14067 [10.1.0.21_ssd02] lam.nd_bootstrap_kolla/lam.nd_bootstrap_kolla.vmdk INFO[0001] Data does not exist, full copy needed
INFO[0002] Creating new volume
INFO[0002] Volume created, setting to bootable volume_id=041975bc-236e-485d-be8c-87992032f5c8 INFO[0002] Setting volume to be UEFI volume_id=041975bc-236e-485d-be8c-87992032f5c8 INFO[0002] Attaching volume volume_id=041975bc-236e-485d-be8c-87992032f5c8 Removing snapshot 100% [======================================================================================================================================================================] (100/100) [0s:0s] Error: Get "http://169.254.169.254/openstack/latest/meta_data.json": dial tcp 169.254.169.254:80: i/o timeout

mnaser commented 1 month ago

@LamNguy : it seems that the metadata service is not reachable on your network, do you know why? is it possible you have static IPs configured and lost access to the DHCP?

LamNguy commented 1 month ago

@mnaser Hi, I think the metadata service is working well since I can create new VM with correct IP, I wonder how the container reach the metadata service, can you let me know where I should run the container. Thanks

LamNguy commented 1 month ago

In my environment, I run the container on the bastion node where can reach both OpenStack and Vmware.

mnaser commented 1 month ago

Can you run curl http://169.254.169.254/openstack/latest/meta_data.json successfully? Also, I wonder if this is because it's podman.. can you add --network host to the commands?

mnaser commented 3 weeks ago

@LamNguy did it end up working with --network host ?

LamNguy commented 3 weeks ago

@mnaser Hi, let me try again with it, thank you

LamNguy commented 3 weeks ago

`[root@registry cloud]# podman run -it --privileged --network host \

-v /dev:/dev \ -v /home/cloud/vmware-vix-disklib-distrib/:/usr/lib64/vmware-vix-disklib:ro \ --env-file <(env | grep OS_) \ registry.atmosphere.dev/library/migratekit:latest \ migrate \ --vmware-endpoint 10.1.0.23 \ --vmware-username lam.nd@vsphere.local \ --vmware-password SVTcoimo@23 \ --vmware-path /svtechhn/vm/cloudvm/lam.ndvm/ubuntu INFO[0000] Setting Disk Bus: virtio
Creating snapshot 100% [==========================================================================================================================] (100/100) [0s:0s] DEBU[0000] Running command: /usr/sbin/nbdkit --exit-with-parent --readonly --foreground --unix=/tmp/migratekit-49414492/nbdkit.sock --pidfile=/tmp/migratekit-49414492/nbdkit.pid vddk server=10.1.0.23 user=lam.nd@vsphere.local password=SVTcoimo@23 thumbprint=FB:9D:25:5D:9C:2B:B2:F5:16:12:D5:3E:DA:36:A7:AE:67:CD:F3:2C compression=skipz vm=moref=vm-14090 snapshot=snapshot-14181 [10.1.0.21_ssd03_nvme] ubuntu_4/ubuntu.vmdk WARN[0001] Change ID mismatch, full copy needed currentChangeId= snapshotChangeId="52 c3 51 5e e8 93 8f b1-0a bd bd 60 2d 55 8b a5/5" INFO[0001] Attaching volume volume_id=1b65f25f-43dc-4d5c-8393-5be787ef2ab1 Removing snapshot 100% [==========================================================================================================================] (100/100) [0s:0s] Error: Get "http://169.254.169.254/openstack/latest/meta_data.json": dial tcp 169.254.169.254:80: i/o timeout Usage: migratekit migrate [flags]

Flags: -h, --help help for migrate

Global Flags: --availability-zone string Openstack availability zone for blockdevice & server --disk-bus-type disk-bus-type Specifies the type of disk controller to attach disk devices to. (default virtio) --vmware-endpoint string VMware endpoint (hostname or IP only) --vmware-password string VMware password --vmware-path string VMware VM path (e.g. '/Datacenter/vm/VM') --vmware-username string VMware username --volume-type string Openstack volume type`

LamNguy commented 3 weeks ago

I try with option --network host but get the same error. I think the problem that the container try to reach the metadata service but from where I run the container it's can not reach the metadata service. @mnaser So I will describe my env again:

mnaser commented 3 weeks ago

@LamNguy is this virtual machine running on OpenStack?

LamNguy commented 3 weeks ago

No, this is VMware virtual machine , so the VM must be a OpenStack VM right?

LamNguy commented 3 weeks ago

@mnaser Hi, I change VM to OpenStack and it fix the error dial tcp 169.254.169.254, I run the command again and found some problems

Flags: -h, --help help for migrat

The second try [root@rhel ~]# docker run -it --privileged --network host -v /dev:/dev -v /root/vmware-vix-disklib-distrib/:/usr/lib64/vmware-vix-disklib:ro --env-file <(env | grep OS_) registry.atmosphere.dev/library/migratekit:latest migrate --vmware-endpoint 10.1.0.23 --vmware-username lam.nd@vsphere.local --vmware-password SVTcoimo@23 --vmware-path /svtechhn/vm/cloudvm/lam.ndvm/ubuntu INFO[0000] Setting Disk Bus: virtio
Creating snapshot 100% [======================================================================================================================================================================] (100/100) [0s:0s] DEBU[0000] Running command: /usr/sbin/nbdkit --exit-with-parent --readonly --foreground --unix=/tmp/migratekit-561067734/nbdkit.sock --pidfile=/tmp/migratekit-561067734/nbdkit.pid vddk server=10.1.0.23 user=lam.nd@vsphere.local password=SVTcoimo@23 thumbprint=FB:9D:25:5D:9C:2B:B2:F5:16:12:D5:3E:DA:36:A7:AE:67:CD:F3:2C compression=skipz vm=moref=vm-14090 snapshot=snapshot-14191 [10.1.0.21_ssd03_nvme] ubuntu_4/ubuntu.vmdk WARN[0001] Change ID mismatch, full copy needed currentChangeId= snapshotChangeId="52 c3 51 5e e8 93 8f b1-0a bd bd 60 2d 55 8b a5/41" INFO[0001] Attaching volume volume_id=be03d993-7a17-4729-9675-38e05e829db8 INFO[0002] Detected instance UUID, attaching volume... instance_uuid=ab243f15-2a37-4663-85c0-ce434d9a7c1a INFO[0003] Device for volume not found, checking again... volume_id=be03d993-7a17-4729-9675-38e05e829db8 INFO[0004] Device for volume not found, checking again... volume_id=be03d993-7a17-4729-9675-38e05e829db8 INFO[0005] Device for volume not found, checking again... volume_id=be03d993-7a17-4729-9675-38e05e829db8 INFO[0006] Device found device=/dev/vdb volume_id=be03d993-7a17-4729-9675-38e05e829db8 INFO[0006] Starting full copy disk="[10.1.0.21_ssd03_nvme] ubuntu_4/ubuntu.vmdk" vm=ubuntu DEBU[0006] Running command: /usr/bin/nbdcopy --progress=3 nbd+unix:///?socket=/tmp/migratekit-561067734/nbdkit.sock /dev/vdb destination=/dev/vdb source="nbd+unix:///?socket=/tmp/migratekit-561067734/nbdkit.sock" munmap_chunk(): invalid pointer nbdcopy: nbd+unix:///?socket=/tmp/migratekit-561067734/nbdkit.sock: nbd_connect_uri: recv: server disconnected unexpectedly Removing snapshot 100% [======================================================================================================================================================================] (100/100) [0s:0s] Error: exit status 1 Usage:

mnaser commented 2 weeks ago

Odd, I've never seen that error. To be honest, we've not really tested with podman so I wonder if there are some sort of selinux or odd issues that are at play here.

Is it easily possible to try to see if it works with an Ubuntu (or Docker) based environment?

LamNguy commented 2 weeks ago

Hi @mnaser , Thank you so much for helping me, so with your advise, I replaced with ubuntu 24 + docker, so I try again:

mnaser commented 2 weeks ago

Hi @mnaser , Thank you so much for helping me, so with your advise, I replaced with ubuntu 24 + docker, so I try again:

  • The issue which the installer did not wait the volume change from creating to available status still occurs, so I need re-run the command but I see the second nbd-copy is a full-copy, comparing with the first time I see the nbd-copy has --destination-is-zero, which is much faster than the second try. I try several time and can pass this issue if lucky ==> could you update the installer to have a timeout option

https://github.com/vexxhost/migratekit/pull/11 should address this, once it merges it should wait 60 seconds for the volume to become available.

  • I run the migrate option first, so after that I run the option cutover but when it run the virt-v2v-inplace has error (with the image attached), do you meet this issue or have any idea Screenshot 2024-08-26 224019
  • I have a question that if the source VM is already install with virtio-driver and cloud-init, so i use option --virt-v2v=false is fine for a migration right?

Yes, virt-v2v is really mostly important for installing drivers (more specifically for Windows systems), if you've got those, you're good to go. I'm working through that bug though.

LamNguy commented 2 weeks ago

Hi @mnaser , thanks for updating the code, can you suggest me with the error virt-v2v-inplace

LamNguy commented 2 weeks ago

Hi @mnaser, With your expertise can I ask you question about the migrate VM to OpenStack using virt-v2v, I have issue that the for example, my vmdk is 100 thin Gb, when I converted volume and upload to openstack it process 100Gb but the actual size of the volume is about 10G. I need run dd command to sparse it. dd if=/var/tmp/ubuntu-sda bs=16M conv=sparse iflag=fullblock | ssh root@192.168.30.25 "dd of=/dev/openstack/volume-ee6c1ed7-6841-4b6a-96f6-1d0e91736112 bs=16M conv=sparse oflag=direct" Can you have any better approach

LamNguy commented 2 weeks ago

Hi, with VM run UEFI, in openstack only support metadata with image: openstack image set --property hw_firmware_type=uefi $IMAGE, so if I export directlty VM to OpenStack volume so I cant find any way to run it with option hw_firmware_type

mnaser commented 1 week ago

Sorry, I can only help you with this project here. I am trying to investigate the passt issue which I've seen with a few of our customers but you can skip it if you don't need the drivers?

LamNguy commented 1 week ago

@mnaser Sure, so I think I will be looking for the update code to resolve this issue since the manual installation of virtio driver is tricky, for example I install virtio-driver on Windows VMware but when I migrate to VMware it seem not recognize the disk, maybe I miss some step.

LamNguy commented 1 week ago

@mnaser I read the #14 , my vddk vmware is also 8.0.3

mnaser commented 1 week ago

OK, the root cause for this passt issue is Ubuntu 24.04

https://discourse.ubuntu.com/t/ubuntu-24-04-lts-noble-numbat-release-notes/39890#unprivileged-user-namespace-restrictions

This is why my development using 22.04 -- I have not run into this issue.

LamNguy commented 1 week ago

Ok, I will try again with ubuntu 22.04

mnaser commented 1 week ago

@LamNguy did things work for you with 22.04 ?

LamNguy commented 1 week ago

Sure, with ubuntu 22.04 the past issue is solved, I can see the convert volume is successfully but at the final the process is failed and no instance is launched on Openstack, I test with OS ubuntu and RHEL8 has same issue. , below is some logs at the end of the process.

Screenshot 2024-09-07 213113 Screenshot 2024-09-07 213141 Screenshot 2024-09-08 083342

mnaser commented 5 days ago

@LamNguy it seems that this is because virt-v2v is taking a long time, so by the time we are shutting down the old nbdkit servers, we are getting NotAuthenticated -- it's almost like a session is timing out ..

mnaser commented 5 days ago

This is related to https://github.com/vmware/govmomi/issues/224

mnaser commented 5 days ago

https://github.com/vexxhost/migratekit/pull/20 should help with timeouts, you can try with the new image once that is merged.

It's wild that your virt-v2v run takes almost 2 hours which is timing out the session, it seems the SELinux relabeling taking a really long time, you can probably get away with skipping it.

mnaser commented 5 days ago

https://github.com/vexxhost/migratekit/pull/21 should help avoid doing the SELinux relabel, I don't really see the point of it, it will cut 86 minutes from your virt-v2v run..

mnaser commented 5 days ago

I think another issue you might be seeing is the lack of nested virtualization which is making things slower (perhaps!)

LamNguy commented 5 days ago

Sure, I will try again later. Our OpenStack is nested virtualization which is built on VMware. I will check it.

LamNguy commented 4 days ago

Hi thanks you very much, with new container image (that bypass relabel selinux), I test migration successfully with ubuntu, RHEL and window. uefi window I will look for other cases, so I have some question that:

mnaser commented 4 days ago

The skip SELinux shouldn't affect it because we're not changing formats, we're operating on a block level.

There's no need to install anything, virt-v2v will inject the correct drivers for Windows.

Network mapping can't easily be overwritten because if we remove it then the system will be confused because how will Migratekit know what network to create the port on?

LamNguy commented 1 day ago

i understand, thank you