Closed cormachogan closed 3 years ago
I wonder if this is related to https://github.com/vmware-tanzu/tce/issues/754
I wonder if this is related to #754
I don't think so. I'm not seeing anything created on my cluster. I did try with a default StorageClass, but I still don't see any attempt to create Prometheus objects on my cluster. I'm redeploying a new cluster now just to confirm the same behaviour.
Did a fresh cluster deployment - same behaviour unfortunately.
Found the issue - Prometheus is relying on an Ingress. Once I deployed the Contour package, Prometheus began to deploy. Probably worth adding some pre-req section to the docs stating that this is a requirement.
Nice work tracking that down!
There is some design work going on right now to be able to define package dependencies. So basically having something like yum or apt where asking to install one package can be smart enough to know that it needs to pull in other packages to actually get something workable.
That work is happening in vmware-tanzu/tanzu-framework
. Just mentioning it here as a breadcrumb for us to track that down and verify it meets the needs to address the issue called out here.
Yeah - that would be very useful @stmcginnis
Very useful thread! Thanks for tracking this down
cc: @LukeWinikates @akodali18 @hillrw3
FYI, Contour Ingress has a requirement on a LoadBalancer service, so you will also need to have something like the NSX ALB integrated with the workload cluster for this to work.
So the dependencies are Prometheus -> Ingress -> Load Balancer Service.
It would be useful to have this linked in some way, or at least prompts to say that there are dependencies.
We are not currently planning on supporting NSX ALB at the MVP release. We are working to get MetalLB included.
That will make things much easier - thanks for the update
@cormachogan if you set ingress.enabled
to false
when installing the prometheus package, does the install then succeed for you?
In theory, the dependency on an ingress is "soft" in that if you opt out of the ingress you shouldn't need any of the ingress- related dependencies. If you don't need to access prometheus outside of your cluster, it should be fine to opt out of the ingress.
The questions in my mind are:
We are planning to open a new PR that will:
port-forward
for any ad-hoc prometheus ui access needs.I think the setting it to false is a good idea @LukeWinikates as the current behaviour of just failing silently is not a good user experience. I think the idea of making it optional, and then adding the port forward instructions is a good one as well.
I would add as much info as possible into the configuration file as well, ideally linking to the official docs at GA.
Thanks @cormachogan for opening this issue and thanks @jpmcb for @
ing us so that we saw it in a timely fashion.
The one thing that I think would still benefit from some attention is @cormachogan's description of how the errors preventing the deployment from succeeding aren't surfaced in any obvious way. I've heard that feedback about kapp-controller
before. I wonder if it would make sense for the tanzu package install
command to do something like:
--watch
or --follow
)kubectl apply -f
to update a deployment and then kubectl rollout status
to watch it happen.
Bug Report
Working my way through the various TCE package installations, I wanted to go through the steps to setup Prometheus. Deployed management cluster on vSphere, and then created a workload cluster (1 cp, 1 worker).
First tried installing Prometheus without changing any of the configuration. No errors reported, but no objects (namespace, pods, svcs) were created in my cluster.
I then exported the Prometheus config to review it. It had the required namespace and replica entries as per the docs - https://quirky-franklin-8969be.netlify.app/docs/latest/prometheus-config/. Only thing I did notice was StorageClassName was not set, so tried to deploy Prometheus with both the default settings and again with StorageClassName set to "default". Neither attempt resulted in any Prometheus objects getting created in the cluster.
To verify that the cluster was working successfully, I installed the fluent-bit package. This successfully created objects on the cluster so (a) I am possibly missing a pre-req step for Prometheus to successfully deploy (which might mean a doc update) or (b) there is an issue with the Prometheus package deployment.
As an aside, there seems to be no way to monitor a package deployment. We really need to have some way of monitoring what is happening during a package install to make troubleshooting possible.
Expected Behavior
I expected objects related to prometheus including alert manager to appear in my workload cluster.
I monitored the kapp-controller logs during the deployment, but I could see nothing out or the ordinary, nor were any errors displayed.
Steps to Reproduce the Bug
On a vSphere based workload cluster, run:
tanzu package install prometheus.tce.vmware.com
Environment Details