Clustername not working when on same Data Center, sharing same storage domains

Schamane187 commented 6 months ago

Hi guys,

I created a Data center SiteA I have made two clusters

SiteA (3 nodes ovna1..3.example.com) SiteA_UX (3 nodes ovnuxa1..3.example.com)

in the Data Center, they share the same Storage Domains

like aa1_ovna_1

now I import vms like this virt-v2v -i vmx /nfs/testmigvm01/testmigvm01.vmx -o ovirt-upload -oc https://admin@ovirt@internalsso@engine-1.example.com/ovirt-engine/api -os aa1_ovna_1 -op /home/migrate/testpw -oo rhv-cafile=/root/ansible-stuff/ca.pem -oo rhv-cluster=SiteA -of qcow2

It gives me an error (verbose isn't giving me more) virt-v2v: error: internal error: invalid argument: /tmp/v2v.CtVpBL/v2vtransfer.json: JSON parse error: end of file expected near 'e'

I managed to get the v2vtransfer.json file copied before it gets deleted and don't understand why it is saying "end of file expected near 'e'" (btw is there a way to avoid the deletion of the temporary files)

{"transfer_id": "bddd91a5-4a15-4d21-a3c1-882c056c794c", "destination_url": "https://ovnuxa01.example.com:54322/images/09e03d5b-542a-49ff-904c-369d17cd2423", "is_ovirt_host": false}

So it gives me a destination url of a node in the wrong cluster

As I have no vm running yet in the UX cluster I set all 3 nodes into maintenance, after doing so it works, cause the destination url is attached to a node in the right cluster.

{"transfer_id": "cd0ea4d4-d7d1-49f1-9d1c-e3e9cd86cf13", "destination_url": "https://ovna01.example.com:54322/images/f85c50a4-d85b-4866-8e21-79330aea96cf", "is_ovirt_host": false}

Maybe this is an ovirt bug and not an virt-v2v, not sure if ovirt gives the wrong url or virt-v2v does not send the parameter as it should

As a plus changing it to rhv-cluster=SiteA_UX does not work as well, and for testing I am not able to set all SiteA Nodes to maintenance.

I hope u understand what I mean, if not, I am happy to give more details

rwmjones commented 6 months ago

Please use virt-v2v -vx and capture the full log (which may be very long). Either attach it here or send it to me directly at rjones@redhat.com if you don't feel like sharing it in public.

Schamane187 commented 6 months ago

Hi, just send u a mail, but as said already, at least for me, there was nothing useful in it

rwmjones commented 6 months ago

(Adding @nirs as this is a RHV / -o rhv-upload thing)

So the actual error is caused by this virt-v2v Python code: https://github.com/libguestfs/virt-v2v/blob/a659b334f5b72b8a9843b1813a97114adc62a543/output/rhv-upload-transfer.py#L41-L49

In the log:

cannot read /etc/vdsm/vdsm.id, using any host: [Errno 2] No such file or directory: '/etc/vdsm/vdsm.id'

I'm not exactly sure why that would be, or whether the node in the wrong cluster symptom is related. Maybe Nir will know more.

nirs commented 6 months ago

(Adding @nirs as this is a RHV / -o rhv-upload thing)

So the actual error is caused by this virt-v2v Python code:

https://github.com/libguestfs/virt-v2v/blob/a659b334f5b72b8a9843b1813a97114adc62a543/output/rhv-upload-transfer.py#L41-L49

In the log:
cannot read /etc/vdsm/vdsm.id, using any host: [Errno 2] No such file or directory: '/etc/vdsm/vdsm.id'
I'm not exactly sure why that would be, or whether the node in the wrong cluster symptom is related. Maybe Nir will know more.

This is not an error but expected condition, meaning that virt-v2v is not running on a ovirt host.

nirs commented 6 months ago

Hi guys,

I created a Data center SiteA I have made two clusters

SiteA (3 nodes ovna1..3.example.com) SiteA_UX (3 nodes ovnuxa1..3.example.com)

The cluster names are not consistent, and this can lead to bugs when using ovirt engine API. When using search, the search can match both SiteA and SiteA_UX since both start with SiteA. To avoid such errors you should rename cluster SiteA (e.g. SiteA_xxx).

in the Data Center, they share the same Storage Domains

Sure, this is how ovirt works, storage belongs to data center, not clusters.

now I import vms like this virt-v2v -i vmx /nfs/testmigvm01/testmigvm01.vmx -o ovirt-upload -oc https://admin@ovirt@internalsso@engine-1.example.com/ovirt-engine/api -os aa1_ovna_1 -op /home/migrate/testpw -oo rhv-cafile=/root/ansible-stuff/ca.pem -oo rhv-cluster=SiteA -of qcow2

Looks right

It gives me an error (verbose isn't giving me more) virt-v2v: error: internal error: invalid argument: /tmp/v2v.CtVpBL/v2vtransfer.json: JSON parse error: end of file expected near 'e'

This is a bug in virtv2v - it should print a full traceback for such errors so we can know where the error happened.

output/rhv-upload-transfer.py could workaround this limitation by caching the json parsing error and adding more context.

This should be the failing line:

https://github.com/libguestfs/virt-v2v/blob/a659b334f5b72b8a9843b1813a97114adc62a543/output/rhv-upload-transfer.py#L252

To make it easier to debug, you can try to build virt-v2v with this patch:

diff --git a/output/rhv-upload-transfer.py b/output/rhv-upload-transfer.py
index 626eff77..4f9084dd 100644
--- a/output/rhv-upload-transfer.py
+++ b/output/rhv-upload-transfer.py
@@ -247,11 +247,16 @@ params = None
 if len(sys.argv) != 2:
     raise RuntimeError("incorrect number of parameters")

 # Parameters are passed in via a JSON document.
 with open(sys.argv[1], 'r') as fp:
-    params = json.load(fp)
+    data = fp.read()
+
+try:
+    params = json.loads(data)
+except ValueError as e:
+    raise RuntimeError(f"Cannot parse params {data:!r}: {e}")

 # What is passed in is a password file, read the actual password.
 with open(params['output_password'], 'r') as fp:
     output_password = fp.read()
 output_password = output_password.rstrip()

This will make it clear why parsing fails and help to find the root cause.

I managed to get the v2vtransfer.json file copied before it gets deleted and don't understand why it is saying "end of file expected near 'e'" (btw is there a way to avoid the deletion of the temporary files)

{"transfer_id": "bddd91a5-4a15-4d21-a3c1-882c056c794c", "destination_url": "https://ovnuxa01.example.com:54322/images/09e03d5b-542a-49ff-904c-369d17cd2423", "is_ovirt_host": false}

This is not the right json - this is the json generate by rhv-upload-transfer.py, not the json it reads.

Since you got the json file, it means that parsing the json input to rhv-upload-transfer.py did work. Maybe this json came from another transfer?

See https://github.com/libguestfs/virt-v2v/blob/a659b334f5b72b8a9843b1813a97114adc62a543/output/rhv-upload-transfer.py#L282

I think the file name should be out.params{N}.json where N is the disk number. Probably 0 for the first disk.

See https://github.com/libguestfs/virt-v2v/blob/a659b334f5b72b8a9843b1813a97114adc62a543/output/output_rhv_upload.ml#L388

So it gives me a destination url of a node in the wrong cluster

No, this is expected behavior. The upload ca be on any available host in the data center. Cluster are virt concepts which do not exist in ovirt storage code.

If you share the complete verbose log of the failing import I may be able to help more. You can send to nsoffer@redhat.com.

nirs commented 6 months ago

@Schamane187 you can try to reproduce with #47, no need to apply the patch suggested in https://github.com/libguestfs/virt-v2v/issues/46#issuecomment-2040366940

nirs commented 6 months ago

@Schamane187 you are also invited to join the libguestfs mailing list, or on IRC channel #guestfs on Libera Chat.

rwmjones commented 6 months ago

As discussed on IRC, if the json output from the Python script is getting corrupted then the JSON parse error would come from here: https://github.com/libguestfs/libguestfs-common/blob/0330ebe40cb645df311bab25888c5ca6cc179efe/mltools/JSON_parser-c.c#L159 However that isn't very helpful because it's just a wrapper around jansson (JSON parsing library) and we don't know exactly what fails.

The original comment quotes the JSON as:

{"transfer_id": "bddd91a5-4a15-4d21-a3c1-882c056c794c", "destination_url": "https://ovnuxa01.example.com:54322/images/09e03d5b-542a-49ff-904c-369d17cd2423", "is_ovirt_host": false}

but that appears valid to me, unless there's something I'm missing or some other part we're not seeing.

rwmjones commented 6 months ago

We pushed a few commits which improve debugging. If you compile virt-v2v from source (don't install it!) then from the build directory do:

./run virt-v2v -vx [etc]

it should produce more useful debugging.

nirs commented 6 months ago

@rwmjones please take a look at #48 which can make debugging easier.

rwmjones commented 6 months ago

Thanks for providing the updated log file. The JSON corruption is really strange:

transfer output before parsing:
{"transfer_id": "3ff1fc5f-3828-4e0a-afa1-6e828f17460f",
"destination_url": "https://ovna01.example.com:54322/images/27f41b83-4b87-4480-951c-9a2f321090f3",
"is_ovirt_host": false}e}

Notice the extra e} which shouldn't be there and I can't quite understand right now where that could come from.

rwmjones commented 6 months ago

It's a long-shot, but does this change help? [patch deleted, we found the problem elsewhere]

rwmjones commented 5 months ago

Bug to fix this in RHEL 9.5: https://issues.redhat.com/browse/RHEL-32105

nirs commented 5 months ago

It would be nice to have a release with this fix so users can consume the fix now without building virt-v2v manually.

rwmjones commented 5 months ago

Virt-v2v 2.5.3 has been released with the patch. I'll put it in Fedora shortly.

libguestfs / virt-v2v

Clustername not working when on same Data Center, sharing same storage domains #46