[stable/prometheus-operator] Validation fails because of missing CRDs even though they have been created

vsliouniaev commented 6 years ago

This is a BUG REPORT

Version of Helm and Kubernetes:

Helm
- 2.10+
Kubernetes
- 1.11.3
- 1.11.4
- 1.13.x

Which chart: stable/prometheus-operator Probably all versions of the chart but occurred on 0.1.7, 0.1.21,4.0.0

What happened: Under some circumstances - apparently just the above helm and kubernetes version the install with default values fails with

Error: validation failed: [unable to recognize "": no matches for kind "Alertmanager" in version
"monitoring.coreos.com/v1", unable to recognize "": no matches for kind "Prometheus" in version 
"monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version 
. . .
. . .

There appears to be a race condition occurring because doing a kubectl get crd | grep coreos a few times will show the resources gradually appear (4 in total) but much later after the error has already occurred and the chart install has failed.

This is reproduceable multiple times if the resources are deleted and the chart installation is attempted again.

What you expected to happen: The crd-install hook used to create the 4 CRDs in this chart should either succeed or fail.

How to reproduce it: Attempt to install the chart on a cluster without the the coreos CRDs and the install fails

UPDATE A proposed fix for this issue can be seen here: https://github.com/helm/helm/pull/5112

kcmartin commented 6 years ago

Note: I encountered this same bug on a newly-created cluster on first helm install of this chart (with default values). K8s: 1.11.3 Helm: 2.11

shangjin92 commented 6 years ago

I also encountered the same bug, helm install always failed.

K8s: 1.11.2 Helm: 2.11

Error: validation failed: [
unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", 
unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]

rwaffen commented 5 years ago

Is there a workaround or a fix?

I have the same problem with: Kubernetes: 1.11 Helm: 2.11

repeated execution shows the error, that the crds are already there....

vsliouniaev commented 5 years ago

@rwaffen would you be so kind as to provide tiller logs for when this is happening?

Edit: or anyone else for that matter? I have lost access to a cluster where I could reproduce this myself. Providing this on https://github.com/helm/helm/issues/4925 is the best chance of making progress

rwaffen commented 5 years ago

i will post it to the other issue

vsliouniaev commented 5 years ago

Thanks a lot @rwaffen. If you haven't figured out how to work around this issue yourself, you're basically looking to create the CRDs first, then use prometheusOperator.createCustomResource=false. If you're experiencing the same thing as in this ticket, you just need to get the chart to fail, then set this value and try a second time

rwaffen commented 5 years ago

hmm okay, sounds strange, but i will test this now.

vsliouniaev commented 5 years ago

@rwaffen If you're sure that the issue you're having is the same as the one I described then this should work for you

rwaffen commented 5 years ago

i don't know what was changed, but today it worked out of the box. i'm very confused. and happy that it is working now. cluster version, kubectl, helm is unchanged. maybe the chart was somehow updated.

vsliouniaev commented 5 years ago

The chart has not been updated in any way to compensate for this issue.

paskal commented 5 years ago

I am mistaken, or it's reproducible only on fresh install? E.g. in such case second install will work fine, but if you'll delete release and delete CRDs manually, then you'll bump into same error on next install.

rwaffen commented 5 years ago

until today it was reproduceable. fresh cluster, i run the helm install, fail. i remove the crds and fail again and again. but something was different today, have to check it on monday again. because couldnt find it today. and because it was working, new problems filled my day, so i forgott abaut the now somehow fixe problem :D

rwaffen commented 5 years ago

ah okay... last time i only had luck. after trying to reinstall, i run again in the race condition... will now try the trick with running it two times and the extra config part

huxiaoliang commented 5 years ago

I encountered this issue (unable to recognize "": no matches for kind) sometimes when fresh install my self chart, but I can't reproduce it stably. I install CRD by crd-install annotation and create CR instance within the chart , but it seems the create CR instance failed due to CRD is not ready in race condition case.

jhohertz commented 5 years ago

Just wanted to note I just recently ran into this, and that it was working just fine on k9s 1.10.x. It's only been since testing 1.11.x that I am seeing this issue arise.

kcmartin commented 5 years ago

Still running into this.

I mistakenly thought that upgrading to Helm 2.12.1 would allow installation of this chart to happen in spite of the CRDs issue; however I've tried launching several new clusters, and I get the same error each time.

K8s: 1.11.5 Helm: 2.12.1

Error: Error: validation failed: [unable to recognize "": no matches for kind "Alertmanager" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "Prometheus" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "PrometheusRule" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1", unable to recognize "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"]

We are trying to add the Prometheus Operator to our automated cluster creation--is there a workaround I can use without a bunch of manual provisioning/deletion?

jhohertz commented 5 years ago

This workaround is the most reliable method I've seen yet: https://github.com/helm/charts/issues/9941#issuecomment-447844259

It looks like the real fix will come in helm 2.13 via: https://github.com/helm/helm/pull/5112

kcmartin commented 5 years ago

Thanks @jhohertz -- will try that workaround.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

vsliouniaev commented 5 years ago

/remove-lifecycle stale

vsliouniaev commented 5 years ago

/add-lifecycle frozen

vsliouniaev commented 5 years ago

/lifecycle frozen

RickS-C137 commented 5 years ago

I still have the problem, after updating to helm 2.13.0 today.

jhohertz commented 5 years ago

Also been my experience that 2.13 is still not working right with CRDs and this chart.

issac-lim commented 5 years ago

so do i. (helm 2.13 & helm 2.12.3)

https://github.com/helm/helm/pull/5112 is needed.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

vsliouniaev commented 5 years ago

Still an issue

vsliouniaev commented 5 years ago

Due to be fixed in the next Helm release

MPV commented 5 years ago

https://github.com/helm/helm/pull/5112 is now merged to master in Helm

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

vsliouniaev commented 5 years ago

I have heard reports that the issue persists even with Helm 2.14. Until this is confirmed resolved I think this should remain open

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

aperullo commented 5 years ago

This issue does still indeed persist in 2.14

JayThomason commented 5 years ago

I just seemed to run into this issue while using helm upgrade --install but when I went back to just using helm install it worked. It's possible I was using upgrade --install incorrectly but definitely seems weird to me that they don't behave the same.

vsliouniaev commented 5 years ago

@JayThomason it's not a specific command - the installation will simply fail sporadically

JayThomason commented 5 years ago

Right. Previously I was wondering if under the hood helm might be doing something slightly different when calling upgrade --install vs. install that would exacerbate this issue.

Now that I read https://github.com/helm/helm/pull/5112 it does look like that should be the authoritative fix.

I'm currently on v2.14.1 so I will see if I can replicate.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

wu0407 commented 5 years ago

same issue with helm 3.0.0 install promethuse-operator

allamand commented 5 years ago

I add the same problem when using custom crdApiGroup:

helm install --namespace monitoring-seb --name prometheus-monitoring-seb -f samples/prometheus-values.yaml stable/prometheus-operator --set prometheusOperator.crdApiGroup=monitoring.seb

then If I reapplied with the createCustomRessource to false, it works

helm install --namespace monitoring-seb --name prometheus-monitoring-seb -f samples/prometheus-values.yaml stable/prometheus-operator --set prometheusOperator.crdApiGroup=monitoring.seb --set prometheusOperator.createCustomResource=false

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

vsliouniaev commented 5 years ago

Still an issue. This appears to now be affecting the stable/prometheus-operator chart during CI

aperullo commented 5 years ago

For us this issue was actually fixed by upgrading helm. Turns out our server version of helm was lower than when the fix was introduced. Since upgrading we've had no more problems with CRDs

vsliouniaev commented 5 years ago

Agree! This issue has been fixed in helm 2.14 but the CI process in this repo is using an old helm version in the helm/chart-testing repo. I have opened a few tickets about this today in various places to get this integrated back

jhohertz commented 5 years ago

So far, I have not seen any issues since moving to helm 2.14.3. I think, maybe, this longstanding issue is now fully resolved?

aperullo commented 5 years ago

@jhohertz Well there was the guy a few posts up, who was having the same issue with Helm 3.0.0, so I'd say no.

jhohertz commented 5 years ago

@aperullo I can't speak to their experience or the 3.0.0 series, however I do know the fix wasn't ported to 3.0.0 until the later betas, so maybe they just need an update to their client?

vsliouniaev commented 5 years ago

Fixed in Helm3 with https://github.com/helm/helm/pull/6332
Still an issue in this repo during CI, so leaving this open until this is resolved with an update to the chart-testing image.

jwalton commented 5 years ago

Fixed in Helm3 with helm/helm#6332

I just ran into this trying to install on EKS with Helm v3.0.0-beta.5, so it's maybe not as fixed as all that. :/

vsliouniaev commented 5 years ago

This PR fixes it for CI in this repo https://github.com/helm/charts/pull/18538

abdennour commented 4 years ago

Hello @jwalton

I just ran into this trying to install on EKS with Helm v3.0.0-beta.5, so it's maybe not as fixed as all that. :/

I've explained the way of prometheus-operator installation in this EKS course. Good luck!

helm / charts

[stable/prometheus-operator] Validation fails because of missing CRDs even though they have been created #9241