redhat-iot / iot-assettracking-demo

IoT Asset Tracking Demo
Eclipse Public License 1.0
22 stars 26 forks source link

Pods are not started #4

Closed ganesh3 closed 6 years ago

ganesh3 commented 7 years ago

NAME READY STATUS RESTARTS AGE dashboard-1-build 0/1 Error 0 19m datastore-1-deploy 0/1 Error 0 4m datastore-proxy-1-build 0/1 Error 0 19m elasticsearch-1-deploy 0/1 Error 0 19m kapua-api-1-deploy 0/1 Error 0 19m kapua-broker-1-deploy 0/1 Error 0 19m kapua-console-1-deploy 0/1 Error 0 19m simulator-1-deploy 0/1 Error 0 19m sql-1-deploy 0/1 Error 0 19m

oc status -v In project Red Hat IoT Demo (redhat-iot) on server https://192.168.225.169:8443

http://dashboard-redhat-iot.192.168.225.169.xip.io to pod port 8080-tcp (svc/dashboard) dc/dashboard deploys istag/dashboard:latest <- bc/dashboard source builds https://github.com/redhat-iot/summit2017#master on openshift/nodejs:4 build #1 failed 19 minutes ago deployment #1 waiting on image or update

svc/datastore-hotrod - 172.30.4.196:11333 dc/datastore deploys openshift/jboss-datagrid65-openshift:1.2 deployment #1 failed 5 minutes ago: image change

http://datastore-proxy-redhat-iot.192.168.225.169.xip.io to pod port 8080-tcp (svc/datastore-proxy) dc/datastore-proxy deploys istag/datastore-proxy:latest <- bc/datastore-proxy source builds https://github.com/redhat-iot/summit2017#master on openshift/wildfly:10.1 build #1 failed 19 minutes ago deployment #1 waiting on image or update

http://search-redhat-iot.192.168.225.169.xip.io to pod port http (svc/elasticsearch) dc/elasticsearch deploys docker.io/library/elasticsearch:2.4 deployment #1 failed 20 minutes ago: config change

http://api-redhat-iot.192.168.225.169.xip.io to pod port http (svc/kapua-api) dc/kapua-api deploys docker.io/redhatiot/kapua-api-jetty:2017-04-08 deployment #1 failed 20 minutes ago: config change

http://broker-redhat-iot.192.168.225.169.xip.io to pod port mqtt-websocket-tcp (svc/kapua-broker) dc/kapua-broker deploys docker.io/redhatiot/kapua-broker:2017-04-08 deployment #1 failed 20 minutes ago: config change

http://console-redhat-iot.192.168.225.169.xip.io to pod port http (svc/kapua-console) dc/kapua-console deploys docker.io/redhatiot/kapua-console-jetty:2017-04-08 deployment #1 failed 20 minutes ago: config change

svc/sql - 172.30.56.47 ports 3306, 8181 dc/sql deploys docker.io/redhatiot/kapua-sql:2017-04-08 deployment #1 failed 20 minutes ago: config change

dc/simulator deploys docker.io/redhatiot/kura-simulator:2017-04-08 deployment #1 failed 20 minutes ago: config change

Errors:

Warnings:

Info:

View details with 'oc describe /' or list everything with 'oc get all'.

Errors: oc logs -f bc/dashboard error: cannot connect to the server: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory ganesh@ganesh-Lenovo-ideapad-100-14IBD:~/summit2017$ oc logs -f bc/datastore-proxy error: cannot connect to the server: open /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory oc adm router -o yaml error: router could not be created; service account "router" is not allowed to access the host network on nodes, grant access with oadm policy add-scc-to-user hostnetwork -z router apiVersion: v1 items:

ccustine commented 7 years ago

What is your environment like? Are you deploying to an existing Openshift environment or are you using Minishift + Openshift?

It would be helpful to post the output from:

Post this info and we'll go from there!

ganesh3 commented 7 years ago

I am only using OC on Ubuntu 16.10. Details as below

Version

oc version

oc v1.5.0+031cbe4 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO

docker version

Client: Version: 17.03.1-ce API version: 1.24 (downgraded from 1.27) Go version: go1.7.5 Git commit: c6d412e Built: Mon Mar 27 17:17:43 2017 OS/Arch: linux/amd64 Error response from daemon: client is newer than server (client API version: 1.24, server API version: 1.23)

Additional Information

oc adm diagnostics [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/home/ganesh/.kube/config' [Note] Could not configure a client, so client diagnostics are limited to testing configuration and connection [Note] Could not configure a client with cluster-admin permissions for the current server, so cluster diagnostics will be skipped

[Note] Running diagnostic: ConfigContexts[redhat-iot/192-168-225-169:8443/system:admin] Description: Validate client config context is complete and has connectivity

ERROR: [DCli0015 from diagnostic ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285] The current client config context is 'redhat-iot/192-168-225-169:8443/system:admin': The server URL is 'https://192.168.225.169:8443' The user authentication is 'system:admin/192-168-225-169:8443' The current project is 'redhat-iot' (*url.Error) Get https://192.168.225.169:8443/api: dial tcp 192.168.225.169:8443: getsockopt: connection refused Diagnostics does not have an explanation for what this means. Please report this error so one can be added.

[Note] Running diagnostic: ConfigContexts[/192-168-225-169:8443/developer] Description: Validate client config context is complete and has connectivity

ERROR: [DCli0015 from diagnostic ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285] For client config context '/192-168-225-169:8443/developer': The server URL is 'https://192.168.225.169:8443' The user authentication is 'developer/192-168-225-169:8443' The current project is 'default' (*url.Error) Get https://192.168.225.169:8443/api: dial tcp 192.168.225.169:8443: getsockopt: connection refused Diagnostics does not have an explanation for what this means. Please report this error so one can be added.

ccustine commented 7 years ago

Can you run oc version and oc adm diagnostics while you are connected to your server? These messages aren't showing the server info which is mostly what I need.

ganesh3 commented 7 years ago

oc version oc v1.5.0+031cbe4 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://10.65.5.186:8443 ooc adm diagnostics [Note] Determining if client configuration exists for client/cluster diagnostics Info: Successfully read a client config file at '/home/ganesh/.kube/config' Info: Using context for cluster-admin access: 'default/10-65-5-186:8443/system:admin'

[Note] Running diagnostic: ConfigContexts[redhat-iot/192-168-225-169:8443/developer] Description: Validate client config context is complete and has connectivity

ERROR: [DCli0010 from diagnostic ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285] For client config context 'redhat-iot/192-168-225-169:8443/developer': The server URL is 'https://192.168.225.169:8443' The user authentication is 'developer/192-168-225-169:8443' The current project is 'redhat-iot' (*url.Error) Get https://192.168.225.169:8443/api: dial tcp 192.168.225.169:8443: i/o timeout

   This means that when we tried to connect to the master API server,
   we could not reach the host at all.
   * You may have specified the wrong host address.
   * This could mean the host is completely unavailable (down).
   * This could indicate a routing problem or a firewall that simply
     drops requests rather than responding by resetting the connection.
   * It does not generally mean that DNS name resolution failed (which
     would be a different error) though the problem could be that it
     gave the wrong address.

[Note] Running diagnostic: ConfigContexts[redhat-iot/192-168-225-169:8443/system:admin] Description: Validate client config context is complete and has connectivity

ERROR: [DCli0010 from diagnostic ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285] For client config context 'redhat-iot/192-168-225-169:8443/system:admin': The server URL is 'https://192.168.225.169:8443' The user authentication is 'system:admin/192-168-225-169:8443' The current project is 'redhat-iot' (*url.Error) Get https://192.168.225.169:8443/api: dial tcp 192.168.225.169:8443: i/o timeout

   This means that when we tried to connect to the master API server,
   we could not reach the host at all.
   * You may have specified the wrong host address.
   * This could mean the host is completely unavailable (down).
   * This could indicate a routing problem or a firewall that simply
     drops requests rather than responding by resetting the connection.
   * It does not generally mean that DNS name resolution failed (which
     would be a different error) though the problem could be that it
     gave the wrong address.

[Note] Running diagnostic: ConfigContexts[/10-65-5-186:8443/developer] Description: Validate client config context is complete and has connectivity

Info: For client config context '/10-65-5-186:8443/developer': The server URL is 'https://10.65.5.186:8443' The user authentication is 'developer/10-65-5-186:8443' The current project is 'default' Successfully requested project list; has access to project(s): [myproject]

[Note] Running diagnostic: ConfigContexts[default/10-65-5-186:8443/system:admin] Description: Validate client config context is complete and has connectivity

Info: For client config context 'default/10-65-5-186:8443/system:admin': The server URL is 'https://10.65.5.186:8443' The user authentication is 'system:admin/10-65-5-186:8443' The current project is 'default' Successfully requested project list; has access to project(s): [default kube-system myproject openshift openshift-infra]

[Note] Running diagnostic: DiagnosticPod Description: Create a pod to run diagnostics from the application standpoint

WARN: [DCli2006 from diagnostic DiagnosticPod@openshift/origin/pkg/diagnostics/client/run_diagnostics_pod.go:134] Timed out preparing diagnostic pod logs for streaming, so this diagnostic cannot run. It is likely that the image 'openshift/origin-deployer:v1.5.0' was not pulled and running yet. Last error: (*errors.StatusError[2]) container "pod-diagnostics" in pod "pod-diagnostic-test-xcnp3" is waiting to start: ContainerCreating

[Note] Running diagnostic: NetworkCheck Description: Create a pod on all schedulable nodes and run network diagnostics from the application standpoint

ERROR: [DNet2001 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:77] Checking network plugin failed. Error: User "developer" cannot get clusternetworks at the cluster scope

[Note] Skipping diagnostic: AggregatedLogging Description: Check aggregated logging integration for proper configuration Because: No master config file was provided

[Note] Running diagnostic: ClusterRegistry Description: Check that there is a working Docker registry

ERROR: [DClu1006 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:209] The "docker-registry" service exists but has no associated pods, so it is not available. Builds and deployments that use the registry will fail.

[Note] Running diagnostic: ClusterRoleBindings Description: Check that the default ClusterRoleBindings are present and contain the expected subjects

Info:  clusterrolebinding/cluster-admins has more subjects than expected.

   Use the `oadm policy reconcile-cluster-role-bindings` command to update the role binding to remove extra subjects.

Info: clusterrolebinding/cluster-admins has extra subject {ServiceAccount default pvinstaller }.

[Note] Running diagnostic: ClusterRoles Description: Check that the default ClusterRoles are present and contain the expected permissions

[Note] Running diagnostic: ClusterRouterName Description: Check there is a working router

ERROR: [DClu2007 from diagnostic ClusterRouter@openshift/origin/pkg/diagnostics/cluster/router.go:156] The "router" DeploymentConfig exists but has no running pods, so it is not available. Apps will not be externally accessible via the router.

[Note] Running diagnostic: MasterNode Description: Check if master is also running node (for Open vSwitch)

Info: Found a node with same IP as master: 10.65.5.186

[Note] Skipping diagnostic: MetricsApiProxy Description: Check the integrated heapster metrics can be reached via the API proxy Because: The heapster service does not exist in the openshift-infra project at this time, so it is not available for the Horizontal Pod Autoscaler to use as a source of metrics.

[Note] Running diagnostic: NodeDefinitions Description: Check node records on master

[Note] Skipping diagnostic: ServiceExternalIPs Description: Check for existing services with ExternalIPs that are disallowed by master config Because: No master config file was detected

[Note] Summary of diagnostics execution (version v1.5.0+031cbe4): [Note] Warnings seen: 1 [Note] Errors seen: 5 penshift v1.5.0+031cbe4 kubernetes v1.5.2+43a9be4

vap0rtranz commented 7 years ago

Looks like I reproduced at least some of Ganesh's issue.

The first Pod to fail is dashboard, so I captured this:

$ oc logs -f bc/dashboard
Cloning "https://github.com/redhat-iot/summit2017" ...
    Commit: cf4d1ed6777ef3dac5bd6e5adfd73e958e543c47 (enable dynamic resetting of alerts)
    Author: jamesfalkner <schtool@gmail.com>
    Date:   Wed Apr 26 13:28:15 2017 -0400

---> Installing application source ...
---> Building your Node application from source

> iot-cargo-demo@2.0.0 postinstall /opt/app-root/src
> bower install
...
Pushing image 172.30.117.172:5000/redhat-iot/dashboard:latest ...
Registry server Address: 
Registry server User Name: serviceaccount
Registry server Email: serviceaccount@example.org
Registry server Password: <<non-empty>>
error: build error: Failed to push image: Get https://172.30.117.172:5000/v1/_ping: dial tcp 172.30.117.172:5000: getsockopt: connection refused

Looks like this is just an unauthorized access to the registry? I'm trying this in a fresh install of our CDKv3.4.

$ oc version
oc v3.4.0.40
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.4.0.40
kubernetes v1.4.0+776c994
vap0rtranz commented 7 years ago

I typo'd on the above CDK version but the oc version was right. I've upgraded to CDKv3 -- really v3 -- and got past the original errors that were in v2.4 .

[jupittma@jupittma summit2017]$ oc version
oc v3.5.5.8
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.42.96:8443
openshift v3.5.5.8
kubernetes v1.5.2+43a9be4
[jupittma@jupittma summit2017]$ oc adm diagnostics
[Note] Determining if client configuration exists for client/cluster diagnostics
Info:  Successfully read a client config file at '/home/jupittma/.kube/config'
Info:  Using context for cluster-admin access: 'redhat-iot/192-168-42-96:8443/system:admin'
...
ERROR: [DCli0014 from diagnostic ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285]
       For client config context '/192-168-42-96:8443/developer':
       The server URL is 'https://192.168.42.96:8443'
       The user authentication is 'developer/192-168-42-96:8443'
       The current project is 'default'
       (*errors.StatusError) the server has asked for the client to provide credentials (get projects)

       This means that when we tried to make a request to the master API
       server, the request required credentials that were not presented. This
       can happen with an expired or invalid authentication token. Try logging
       in with this user again.

Info:  Output from the diagnostic pod (image registry.access.redhat.com/openshift3/ose-deployer:v3.5.5.8):
       [Note] Running diagnostic: PodCheckAuth
              Description: Check that service account credentials authenticate as expected

       Info:  Service account token successfully authenticated to master
       Info:  Service account token was authenticated by the integrated registry.
       [Note] Running diagnostic: PodCheckDns
              Description: Check that DNS within a pod works as expected

       [Note] Summary of diagnostics execution (version v3.5.5.8):
       [Note] Completed with no errors or warnings seen.
... 
WARN:  [DNet2002 from diagnostic NetworkCheck@openshift/origin/pkg/diagnostics/network/run_pod.go:81]
       Skipping network diagnostics check. Reason: Not using openshift network plugin.
...     
[Note] Summary of diagnostics execution (version v3.5.5.8):
[Note] Warnings seen: 1
[Note] Errors seen: 1

Now the build fails in dashboard-proxy while fetching the wildfly image:

error instantiating Build from BuildConfig redhat-iot/datastore-proxy: Error resolving ImageStreamTag wildfly:10.1 in namespace openshift: imagestreams "wildfly" not found

So perhaps there are hard requirements, like minishift based install? CDKv2 wasn't based on minishift.

BTW: my host is RHEL, not Ubuntu.

vap0rtranz commented 7 years ago

thinking images were simply missing, like wildfly, I ran the recommended import:

[jupittma@jupittma summit2017]$ oc create -n openshift -f https://raw.githubusercontent.com/jboss-openshift/application-templates/master/jboss-image-streams.json
imagestream "jboss-datagrid65-client-openshift" created
imagestream "jboss-datavirt63-driver-openshift" created
imagestream "redhat-sso71-openshift" created
imagestream "redhat-openjdk18-openshift" created
Error from server (AlreadyExists): imagestreams "jboss-webserver30-tomcat7-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-webserver30-tomcat8-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-eap64-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-eap70-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-decisionserver62-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-decisionserver63-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-processserver63-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-datagrid65-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-datavirt63-openshift" already exists
Error from server (AlreadyExists): imagestreams "jboss-amq-62" already exists
Error from server (AlreadyExists): imagestreams "redhat-sso70-openshift" already exists
[jupittma@jupittma summit2017]$ oc create -n openshift -f https://raw.githubusercontent.com/openshift/origin/master/examples/image-streams/image-streams-centos7.json
imagestream "wildfly" created
Error from server (AlreadyExists): imagestreams "ruby" already exists
Error from server (AlreadyExists): imagestreams "nodejs" already exists
Error from server (AlreadyExists): imagestreams "perl" already exists
Error from server (AlreadyExists): imagestreams "php" already exists
Error from server (AlreadyExists): imagestreams "python" already exists
Error from server (AlreadyExists): imagestreams "mysql" already exists
Error from server (AlreadyExists): imagestreams "mariadb" already exists
Error from server (AlreadyExists): imagestreams "postgresql" already exists
Error from server (AlreadyExists): imagestreams "mongodb" already exists
Error from server (AlreadyExists): imagestreams "redis" already exists
Error from server (AlreadyExists): imagestreams "jenkins" already exists

then redeploying looks a bit better.

vap0rtranz commented 7 years ago

Redeploy looks good. Case closed for me by upgrading to CDKv3 / minishift.

NOTE: I also doubled my VM memory (from 4 to 8GiB) at the same time. I never saw memory specific errors but wanted to call out any change.

[jupittma@jupittma summit2017]$ oc status -v
In project Red Hat IoT Demo 2 (redhat-iot) on server https://192.168.42.11:8443

http://dashboard-redhat-iot.192.168.42.11.nip.io to pod port 8080-tcp (svc/dashboard)
  dc/dashboard deploys istag/dashboard:latest <-
    bc/dashboard source builds https://github.com/redhat-iot/summit2017#master on openshift/nodejs:4 
    deployment #1 deployed about a minute ago - 1 pod

svc/datastore-hotrod - 172.30.24.159:11333
  dc/datastore deploys openshift/jboss-datagrid65-openshift:1.2 
    deployment #1 deployed 7 minutes ago - 1 pod

http://datastore-proxy-redhat-iot.192.168.42.11.nip.io to pod port 8080-tcp (svc/datastore-proxy)
  dc/datastore-proxy deploys istag/datastore-proxy:latest <-
    bc/datastore-proxy source builds https://github.com/redhat-iot/summit2017#master on openshift/wildfly:10.1 
    deployment #1 deployed 2 minutes ago - 1 pod

http://search-redhat-iot.192.168.42.11.nip.io to pod port http (svc/elasticsearch)
  dc/elasticsearch deploys docker.io/library/elasticsearch:2.4 
    deployment #1 deployed 7 minutes ago - 1 pod

http://api-redhat-iot.192.168.42.11.nip.io to pod port http (svc/kapua-api)
  dc/kapua-api deploys docker.io/redhatiot/kapua-api-jetty:2017-04-08 
    deployment #1 deployed 7 minutes ago - 1 pod

http://broker-redhat-iot.192.168.42.11.nip.io to pod port mqtt-websocket-tcp (svc/kapua-broker)
  dc/kapua-broker deploys docker.io/redhatiot/kapua-broker:2017-04-08 
    deployment #1 deployed 7 minutes ago - 1 pod

http://console-redhat-iot.192.168.42.11.nip.io to pod port http (svc/kapua-console)
  dc/kapua-console deploys docker.io/redhatiot/kapua-console-jetty:2017-04-08 
    deployment #1 deployed 7 minutes ago - 1 pod

svc/sql - 172.30.68.210 ports 3306, 8181
  dc/sql deploys docker.io/redhatiot/kapua-sql:2017-04-08 
    deployment #1 deployed 7 minutes ago - 1 pod

dc/simulator deploys docker.io/redhatiot/kura-simulator:2017-04-08 
  deployment #1 deployed 7 minutes ago - 1 pod (warning: 4 restarts)

Warnings:
  * pod/simulator-1-z3mkg has restarted within the last 10 minutes

Info:
  * pod/dashboard-1-build has no liveness probe to verify pods are still running.
    try: oc set probe pod/dashboard-1-build --liveness ...
  * pod/datastore-proxy-1-build has no liveness probe to verify pods are still running.
    try: oc set probe pod/datastore-proxy-1-build --liveness ...
  * dc/datastore has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/datastore --readiness ...
  * dc/datastore has no liveness probe to verify pods are still running.
    try: oc set probe dc/datastore --liveness ...
  * dc/elasticsearch has no liveness probe to verify pods are still running.
    try: oc set probe dc/elasticsearch --liveness ...
  * dc/kapua-api has no liveness probe to verify pods are still running.
    try: oc set probe dc/kapua-api --liveness ...
  * dc/kapua-broker has no liveness probe to verify pods are still running.
    try: oc set probe dc/kapua-broker --liveness ...
  * dc/kapua-console has no liveness probe to verify pods are still running.
    try: oc set probe dc/kapua-console --liveness ...
  * dc/simulator has no readiness probe to verify pods are ready to accept traffic or ensure deployment is successful.
    try: oc set probe dc/simulator --readiness ...
  * dc/simulator has no liveness probe to verify pods are still running.
    try: oc set probe dc/simulator --liveness ...
  * dc/sql has no liveness probe to verify pods are still running.
    try: oc set probe dc/sql --liveness ...
ccustine commented 6 years ago

I think the original issues reported were some type of network issue or some other issue affecting the openshift cluster. I am going to close this for now.