Open andrewtchin opened 6 years ago
the result of this is that the user cannot use the h5 plugin, marking as high, reset priority after triage
Seems like this has to do with OVA network. The UI is getting the IP address from the vim25 API as shown below:
https://github.com/vmware/vic-ui/blob/master/h5c/vic-service/src/main/java/com/vmware/vic/PropFetcher.java#L430 https://github.com/vmware/vic-ui/blob/master/h5c/vic-service/src/main/java/com/vmware/vic/model/VicApplianceVm.java
@andrewtchin does this belong in vic-product? It looks like it to me.
What's the root cause? Are you saying the OVA does not have an IP displayed in vCenter at the time this is run?
I believe that's what is happening. Somehow it's not finding an IP. @jooskim ^^
I think the UI installer or something should retry or fail if it sees null as the appliance IP. I'm not clear on where the IP is coming from and at what stage this is being queried. When the appliance is first booted the IP doesn't show up in vSphere for a few seconds, but that's the only time I can think that would be the case since the toolbox reports the IP before you can even access the webserver on the appliance.
@gigawhitlocks can you try this again with our latest OVA? I'm not quite sure how to reproduce this.
Where is the IP that is showing up as null
coming from?
It's the appliance IP. It's that whole IP lookup mechanism that checks for the vic tagged vm that @jooskim and @jzt worked on. That's why I was thinking maybe the OVA seeing this issue maybe didn't have all of the pieces for that in place at that point.
But doesn't that mean that the UI plugin is checking vSphere for the VM that has that tag? If so, that code should retry if it receives a null value and fail if it keeps getting null
From https://github.com/vmware/vic/blob/master/lib/install/ova/configure.go#L87:
func getOvaVMByTag(ctx context.Context, sess *session.Session, u string) (*vm.VirtualMachine, error) {
ovaURL, err := url.Parse(u)
if err != nil {
return nil, err
}
host := ovaURL.Hostname()
log.Debugf("Looking up host %s", host)
ips, err := net.LookupIP(host)
if err != nil {
return nil, errors.Errorf("IP lookup failed: %s", err)
}
log.Debugf("found %d IP(s) from hostname lookup on %s:", len(ips), host)
var ip string
for _, i := range ips {
log.Debugf(i.String())
if i.To4() != nil {
ip = i.String()
}
}
if ip == "" {
return nil, errors.Errorf("IPV6 support not yet implemented")
}
vms, err := admiral.DefaultDiscovery.Discover(ctx, sess)
if err != nil {
return nil, errors.Errorf("failed to discover OVA vm(s): %s", err)
}
log.Infof("Found %d VM(s) tagged as OVA", len(vms))
for i, v := range vms {
log.Debugf("Checking IP for %s", v.Reference().Value)
vmIP, err := v.WaitForIP(ctx)
if err != nil && i == len(vms)-1 {
return nil, errors.Errorf("failed to get VM IP: %s", err)
}
// verify the tagged vm has the IP we expect
if vmIP == ip {
log.Debugf("Found OVA with matching IP: %s", ip)
return v, nil
}
}
return nil, errors.Errorf("no VM(s) found with OVA tag")
}
It looks like it's sitting there hanging in the call to WaitForIP
, which lives down in the govmomi layer. That it occurs both with IP and FQDN suggests to me that it could be something on the VC side (that particular VC?). In all my tests, I have never seen the WaitForIP
step take more than a second or so.
We should probably change the logging in there to Info or possibly add a few more lines to inform the user what is going on.
EDIT: Specifically changing the log.Debugf("Checking IP for %s", v.Reference().Value)
to Info
level.
@andrewtchin if we don't have the IP by the time the user interacts with the plugin something went awry and there is no point in continually retrying from plugin side. That's not something we do anyway typically because it's continual network requests and generally slows the browser way down. As @jzt stated it seems to be further upstream and could use some better error handling.
Can you figure out who this should be assign to @andrewtchin?
Added https://github.com/vmware/vic-ui/issues/213 to track handling getting a null value on the client side and displaying appropriate error messaging.
@gigawhitlocks can you confirm you were not seeing this after RC3? If it's still an issue it needs to be investigated on OVA side. All we can do on client is intercept the null and display an error.
We will add a log for this, but the env is gone and we haven't been able to repro it. If we see it in the future we should check to see if the appliance reports an IP in vCenter from toolbox.
In RC3 this error does not happen. Instead I get the normal failure for being unable to verify the cert (because the FQDN of the VCH is not used, and the cert is signed for the FQDN, not the IP) and I have to click through to accept the certificate, despite using a signed certificate. That said, it finds the IP correctly and I can accept the cert and click 'refresh' and that all works as it should.
This downgrades the issue from being a show-stopper to one that is a minor annoyance.
And the user is prompted so that they know to use click to accept the cert? Also as a note the cause of that could be someone doesn't trust the LE root or we're not performing cert validation correctly.
Yes, it alerts the user.
And no, the reason that the cert shows as untrusted is because the UI provides a link to access it as https://IP_ADDRESS and it should be https://DOMAIN because that's what the certificate is issued for. The certificate doesn't contain IP SANs because the VCH is assigned a dynamic IP, and so the certificate is only valid for the FQDN and not the IP of the server.
This is the same installation. Notice that beautiful green lock and no little warning saying I've saved this certificate as trusted. That's because I'm accessing via the domain name.
Now accessing via the IP address, the warning symbol is there, to indicate that I overrode the warning:
The IP and FQDN for a deployment are not interchangeable and the product shouldn't treat them as though they are. The biggest issue that all of this brings up is that in some places FQDNs get translated into IPs and then those IPs are stored and used instead of FQDNs later on. This will cause problems with access if, e.g., a customer provides an FQDN because the IP of the component being accessed may change.
I got the same issue but in my case it's do to the fact that the requirement to open port 8443 for Cloud Admin is nowhere documented.
From what the error sais it seems that 8443 should be reachable from the machine the Cloud Admin is using to access the vSphere web client.
Suggested fix(es):
1.) document the requirement for port 8443 2.) build in a proxy function to the vSphere plugin so the connection to 8443 is established from vCenter (static) rather then from the Cloud Admin machine (variable)
@m451 Thanks for your report - I opened https://github.com/vmware/vic-product/issues/1517 for updating the diagram Also you can see the networking requirements here https://vmware.github.io/vic-product/assets/files/html/1.3/vic_vsphere_admin/security_reference.html
In my case, I discovered that the root cause of my error was having an old, powered off OVA as a part of my vSphere inventory following upgrade. It seems unlikely that this is always the cause for this error message, but is certainly something we should better handle.
@gigawhitlocks commented on Tue Dec 05 2017
Performed a VIC UI plugin install as such:
After install, this is seen:
The error provides a link to "fix" the issue, but that opens https://null:8443 which clearly doesn't work 😆
@gigawhitlocks commented on Tue Dec 05 2017
It's possible this issue is not related to the FQDN, but it may need to be noted that my VIC OVA also has an FQDN in this deployment, vch.theknown.net .
@gigawhitlocks commented on Tue Dec 05 2017
I have tried this again and provided an IP instead of an FQDN and got the same error; have updated the title of the issue. Will try a full reinstall of VIC OVA w/o a FQDN provided and I will see if I get a different result.