spring-cloud / spring-cloud-dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
https://dataflow.spring.io
Apache License 2.0
1.11k stars 583 forks source link

scdf server initialization on plain k8s: No valid primary TaskPlatform configured #3439

Closed historus closed 5 years ago

historus commented 5 years ago

Hi,

I have set up the SCDF according to the documentation here https://dataflow.spring.io/docs/installation/kubernetes/kubectl/. When starting the scdf pod I get the following error:


/ | _ () / | | _ _| | \ | '_ | '| | ' \ / | | | | |/ _ \| | | |/ _ | __) | |) | | | | | | | (| | | || | () | || | (_| | |__/| ./|| ||| ||_, | ____||_/ _,|_,| __ || |/ ____ | \ | | | | | \ \ \ \ \ \ | | | |/ ` | / ` | | | | |/ \ \ /\ / / \ \ \ \ \ \ | || | (| | || (| | | | | | () \ V V / / / / / / / |/ _,|\_,| || ||_/ _/_/ /////_/

Spring Cloud Data Flow Server (v2.2.0.RELEASE)

2019-08-15 12:19:06.707 INFO 1 --- [ main] b.c.PropertySourceBootstrapConfiguration : Located property source: CompositePropertySource {name='composite-configmap', propertySources=[ConfigMapPropertySource {name='configmap.scdf-server.ns-team-cpx-pgc-stage-va6'}]} 2019-08-15 12:19:16.726 INFO 1 --- [ main] b.c.PropertySourceBootstrapConfiguration : Located property source: SecretsPropertySource {name='secrets.mysql.ns-team-cpx-pgc-stage-va6'} 2019-08-15 12:19:16.770 INFO 1 --- [ main] o.s.c.d.s.s.DataFlowServerApplication : The following profiles are active: kubernetes 2019-08-15 12:19:17.870 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Multiple Spring Data modules found, entering strict repository configuration mode! 2019-08-15 12:19:17.873 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data repositories in DEFAULT mode. 2019-08-15 12:19:17.891 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 10ms. Found 0 repository interfaces. 2019-08-15 12:19:17.919 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Multiple Spring Data modules found, entering strict repository configuration mode! 2019-08-15 12:19:17.920 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data repositories in DEFAULT mode. 2019-08-15 12:19:18.067 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 143ms. Found 1 repository interfaces. 2019-08-15 12:19:18.200 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Multiple Spring Data modules found, entering strict repository configuration mode! 2019-08-15 12:19:18.201 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data repositories in DEFAULT mode. 2019-08-15 12:19:18.240 INFO 1 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 39ms. Found 5 repository interfaces. 2019-08-15 12:19:18.531 INFO 1 --- [ main] o.s.cloud.context.scope.GenericScope : BeanFactory id=dfc710d8-2aef-355a-85ea-5f8eb6bd96e3 2019-08-15 12:19:19.028 INFO 1 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 80 (http) 2019-08-15 12:19:19.044 INFO 1 --- [ main] o.a.coyote.http11.Http11NioProtocol : Initializing ProtocolHandler ["http-nio-80"] 2019-08-15 12:19:19.054 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat] 2019-08-15 12:19:19.055 INFO 1 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet engine: [Apache Tomcat/9.0.21] 2019-08-15 12:19:19.130 INFO 1 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext 2019-08-15 12:19:19.632 INFO 1 --- [ main] o.s.c.d.s.config.web.WebConfiguration : Start Embedded H2 2019-08-15 12:19:19.632 INFO 1 --- [ main] o.s.c.d.s.config.web.WebConfiguration : Starting H2 Server with URL: jdbc:h2:tcp://localhost:19092/mem:dataflow 2019-08-15 12:19:19.949 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting... 2019-08-15 12:19:19.969 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed. 2019-08-15 12:19:19.999 INFO 1 --- [ main] o.f.c.internal.license.VersionPrinter : Flyway Community Edition 5.2.4 by Boxfuse 2019-08-15 12:19:20.009 INFO 1 --- [ main] o.f.c.internal.database.DatabaseFactory : Database: jdbc:h2:tcp://localhost:19092/mem:dataflow (H2 1.4) 2019-08-15 12:19:20.032 WARN 1 --- [ main] o.f.c.internal.database.base.Database : Flyway upgrade recommended: H2 1.4.199 is newer than this version of Flyway and support has not been tested. 2019-08-15 12:19:20.091 INFO 1 --- [ main] o.f.core.internal.command.DbValidate : Successfully validated 1 migration (execution time 00:00.028s) 2019-08-15 12:19:20.101 INFO 1 --- [ main] o.f.c.i.s.JdbcTableSchemaHistory : Creating Schema History table: "PUBLIC"."flyway_schema_history_dataflow" 2019-08-15 12:19:20.150 INFO 1 --- [ main] o.f.core.internal.command.DbMigrate : Current version of schema "PUBLIC": << Empty Schema >> 2019-08-15 12:19:20.152 INFO 1 --- [ main] o.f.core.internal.command.DbMigrate : Migrating schema "PUBLIC" to version 1 - INITIAL SETUP 2019-08-15 12:19:20.186 INFO 1 --- [ main] o.f.core.internal.command.DbMigrate : Successfully applied 1 migration to schema "PUBLIC" (execution time 00:00.086s) 2019-08-15 12:19:20.332 INFO 1 --- [ main] o.hibernate.jpa.internal.util.LogHelper : HHH000204: Processing PersistenceUnitInfo [ name: default ...] 2019-08-15 12:19:20.410 INFO 1 --- [ main] org.hibernate.Version : HHH000412: Hibernate Core {5.3.10.Final} 2019-08-15 12:19:20.413 INFO 1 --- [ main] org.hibernate.cfg.Environment : HHH000206: hibernate.properties not found 2019-08-15 12:19:20.620 INFO 1 --- [ main] o.hibernate.annotations.common.Version : HCANN000001: Hibernate Commons Annotations {5.0.4.Final} 2019-08-15 12:19:21.025 INFO 1 --- [ main] org.hibernate.dialect.Dialect : HHH000400: Using dialect: org.hibernate.dialect.H2Dialect 2019-08-15 12:19:21.846 INFO 1 --- [ main] j.LocalContainerEntityManagerFactoryBean : Initialized JPA EntityManagerFactory for persistence unit 'default' 2019-08-15 12:19:22.621 INFO 1 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService 'applicationTaskExecutor' 2019-08-15 12:19:22.712 WARN 1 --- [ main] aWebConfiguration$JpaWebMvcConfiguration : spring.jpa.open-in-view is enabled by default. Therefore, database queries may be performed during view rendering. Explicitly configure spring.jpa.open-in-view to disable this warning 2019-08-15 12:19:23.429 INFO 1 --- [ main] o.s.b.c.r.s.JobRepositoryFactoryBean : No database type set, using meta data indicating: H2 2019-08-15 12:19:23.480 INFO 1 --- [ main] o.s.c.d.s.b.SimpleJobServiceFactoryBean : No database type set, using meta data indicating: H2 2019-08-15 12:19:23.516 WARN 1 --- [ main] o.s.c.d.s.c.f.SchedulerConfiguration : TaskPlatform Kubernetes is selected as primary but has no TaskLaunchers configured 2019-08-15 12:19:23.517 WARN 1 --- [ main] ConfigServletWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'schedulerService' defined in class path resource [org/springframework/cloud/dataflow/server/config/features/SchedulerConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.cloud.dataflow.server.service.SchedulerService]: Factory method 'schedulerService' threw exception; nested exception is java.lang.IllegalStateException: No valid primary TaskPlatform configured 2019-08-15 12:19:23.519 INFO 1 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Shutting down ExecutorService 'applicationTaskExecutor' 2019-08-15 12:19:23.521 INFO 1 --- [ main] j.LocalContainerEntityManagerFactoryBean : Closing JPA EntityManagerFactory for persistence unit 'default' 2019-08-15 12:19:23.523 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown initiated... 2019-08-15 12:19:23.533 INFO 1 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Shutdown completed. 2019-08-15 12:19:23.535 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Stopping service [Tomcat] 2019-08-15 12:19:23.537 WARN 1 --- [ main] o.a.c.loader.WebappClassLoaderBase : The web application [ROOT] appears to have started a thread named [H2 TCP Server (tcp://172.16.65.196:19092)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.net.PlainSocketImpl.socketAccept(Native Method) java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) java.net.ServerSocket.implAccept(ServerSocket.java:545) java.net.ServerSocket.accept(ServerSocket.java:513) org.h2.server.TcpServer.listen(TcpServer.java:248) org.h2.tools.Server.run(Server.java:610) java.lang.Thread.run(Thread.java:748) 2019-08-15 12:19:23.563 ERROR 1 --- [ main] o.s.boot.SpringApplication : Application run failed

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'schedulerService' defined in class path resource [org/springframework/cloud/dataflow/server/config/features/SchedulerConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.cloud.dataflow.server.service.SchedulerService]: Factory method 'schedulerService' threw exception; nested exception is java.lang.IllegalStateException: No valid primary TaskPlatform configured at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:627) at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:607) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1321) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1160) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:555) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515) at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:845) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:877) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549) at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:140) at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:742) at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:389) at org.springframework.boot.SpringApplication.run(SpringApplication.java:311) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1213) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1202) at org.springframework.cloud.dataflow.server.single.DataFlowServerApplication.main(DataFlowServerApplication.java:50) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:47) at org.springframework.boot.loader.Launcher.launch(Launcher.java:86) at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51) Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.cloud.dataflow.server.service.SchedulerService]: Factory method 'schedulerService' threw exception; nested exception is java.lang.IllegalStateException: No valid primary TaskPlatform configured at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185) at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:622) ... 27 common frames omitted Caused by: java.lang.IllegalStateException: No valid primary TaskPlatform configured at org.springframework.cloud.dataflow.server.config.features.SchedulerConfiguration.primaryTaskPlatform(SchedulerConfiguration.java:110) at org.springframework.cloud.dataflow.server.config.features.SchedulerConfiguration.schedulerService(SchedulerConfiguration.java:77) at org.springframework.cloud.dataflow.server.config.features.SchedulerConfiguration$$EnhancerBySpringCGLIB$$41d0afed.CGLIB$schedulerService$0() at org.springframework.cloud.dataflow.server.config.features.SchedulerConfiguration$$EnhancerBySpringCGLIB$$41d0afed$$FastClassBySpringCGLIB$$1fce2cd1.invoke() at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244) at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:363) at org.springframework.cloud.dataflow.server.config.features.SchedulerConfiguration$$EnhancerBySpringCGLIB$$41d0afed.schedulerService() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154) ... 28 common frames omitted

From this error I cannot tell what's actually wrong.

I already increased the resources granted to the scdf to quite more CPU and RAM.

sabbyanandan commented 5 years ago

Hi, @historus. Sorry that you're having trouble with it - no fun. Did you customize anything in the deployment YAML or are you using the stock v2.2.0 tagged files from GitHub?

The reason I ask, because the error, "No valid primary TaskPlatform configured" tells me that the default platform configuration (see: server-config.yaml#L30-L36) that we create for SCDF is not correctly bootstrapped, and thus SCDF is failing to start.

It'd help if you can share the kubectl describe po/<SCDF-POD> output here. Of course, please remove any sensitive credentials from it before sharing. Also, where are you running this? (minikube, gke, etc.)

historus commented 5 years ago

Hi,

thanks for taking the time. I have only modified things which shouldn't be relevant, e.g. removed the loadbalancer of the services and created ingressroute resources instead as the kubernetes setup we are using doesn't use LBs. It is running on a custom k8s setup on aws ec2 instances. Could it be the ConfigMap cannot be accessed and the result is this failure? How can we troubleshoot the error?

Thanks

historus commented 5 years ago
Name:           scdf-server-7849bd7658-z4wfb
Namespace:      ns-team-cpx-pgc-stage-va6
Priority:       0
Node:           ip-10-100-218-229.ec2.internal/10.100.218.229
Start Time:     Thu, 15 Aug 2019 17:10:16 +0200
Labels:         app=scdf-server
                pod-template-hash=7849bd7658
Annotations:    kubernetes.io/limit-ranger:
                  LimitRanger plugin set: cpu, memory request for init container init-mysql-wait; cpu, memory limit for init container init-mysql-wait
                kubernetes.io/psp: restricted
                seccomp.security.alpha.kubernetes.io/pod: docker/default
Status:         Running
IP:             172.16.65.98
Controlled By:  ReplicaSet/scdf-server-7849bd7658
Init Containers:
  init-mysql-wait:
    Container ID:  docker://cf485632886b710c7bd8aac3c5c33923262470827c680d73655fcf94d33de3f8
    Image:         busybox
    Image ID:      docker-pullable://busybox@sha256:9f1003c480699be56815db0f8146ad2e22efea85129b5b5983d0e0fb52d9ab70
    Port:          
    Host Port:     
    Command:
      sh
      -c
      until nc -w3 -z mysql 3306; do echo waiting for mysql; sleep 3; done;
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 15 Aug 2019 17:10:18 +0200
      Finished:     Thu, 15 Aug 2019 17:10:18 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  512Mi
    Requests:
      cpu:        100m
      memory:     256Mi
    Environment:  
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from scdf-sa-token-nbpdf (ro)
Containers:
  scdf-server:
    Container ID:   docker://875dc77ed0e40f07dbc7a6b6ec550e965b3c5b8f3f2a2d202c273d8c10923b07
    Image:          springcloud/spring-cloud-dataflow-server:2.2.0.RELEASE
    Image ID:       docker-pullable://springcloud/spring-cloud-dataflow-server@sha256:3e47945e8dd65098396901409f08fa8f7e39a24a74ed0799c094f72a6871db25
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 15 Aug 2019 17:10:18 +0200
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     32
      memory:  256Gi
    Requests:
      cpu:     4
      memory:  16344Mi
    Environment:
      KUBERNETES_NAMESPACE:                              ns-team-cpx-pgc-stage-va6 (v1:metadata.namespace)
      SERVER_PORT:                                       80
      SPRING_CLOUD_CONFIG_ENABLED:                       false
      SPRING_CLOUD_DATAFLOW_FEATURES_ANALYTICS_ENABLED:  true
      SPRING_CLOUD_DATAFLOW_FEATURES_SCHEDULES_ENABLED:  true
      SPRING_CLOUD_KUBERNETES_SECRETS_ENABLE_API:        true
      SPRING_CLOUD_KUBERNETES_SECRETS_NAME:              mysql
      SPRING_CLOUD_KUBERNETES_CONFIG_NAME:               scdf-server
      SPRING_CLOUD_DATAFLOW_SERVER_URI:                  http://${SCDF_SERVER_SERVICE_HOST}:${SCDF_SERVER_SERVICE_PORT}
      SPRING_CLOUD_SKIPPER_CLIENT_SERVER_URI:            https://skipper.domain.net/api
      SPRING_APPLICATION_JSON:                           { "maven": { "local-repository": null, "remote-repositories": { "repo1": { "url": "https://repo.spring.io/libs-snapshot"} } } }
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from scdf-sa-token-nbpdf (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  scdf-sa-token-nbpdf:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  scdf-sa-token-nbpdf
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                                     Message
  ----    ------     ----  ----                                     -------
  Normal  Scheduled  12m   default-scheduler                        Successfully assigned ns-team-cpx-pgc-stage-va6/scdf-server-7849bd7658-z4wfb to ip-10-100-218-229.ec2.internal
  Normal  Pulling    12m   kubelet, ip-10-100-218-229.ec2.internal  Pulling image "busybox"
  Normal  Pulled     12m   kubelet, ip-10-100-218-229.ec2.internal  Successfully pulled image "busybox"
  Normal  Created    12m   kubelet, ip-10-100-218-229.ec2.internal  Created container init-mysql-wait
  Normal  Started    12m   kubelet, ip-10-100-218-229.ec2.internal  Started container init-mysql-wait
  Normal  Pulling    12m   kubelet, ip-10-100-218-229.ec2.internal  Pulling image "springcloud/spring-cloud-dataflow-server:2.2.0.RELEASE"
  Normal  Pulled     12m   kubelet, ip-10-100-218-229.ec2.internal  Successfully pulled image "springcloud/spring-cloud-dataflow-server:2.2.0.RELEASE"
  Normal  Created    12m   kubelet, ip-10-100-218-229.ec2.internal  Created container scdf-server
  Normal  Started    12m   kubelet, ip-10-100-218-229.ec2.internal  Started container scdf-server
            
sabbyanandan commented 5 years ago

Thanks. Based on the events towards the end, it appears the container is now successfully started, though. Are you not seeing the errors or the restarts due to the liveness/readiness reporting it as non-healthy?

chrisjs commented 5 years ago

It is running on a custom k8s setup on aws ec2 instances. Could it be the ConfigMap cannot be accessed and the result is this failure? How can we troubleshoot the error?

src/kubernetes/server/server-deployment.yaml has a var: SPRING_CLOUD_KUBERNETES_CONFIG_NAME which references the expected config map. so with the defaults, verify the config map is present:

kubectl get cm scdf-server -o yaml

there should be a block such as:

          task:
            platform:
              kubernetes:
                accounts:
                  default:
                    limits:
                      memory: 1024Mi

does that exist and under the correct name? also verify the formatting is correct

kubectl get cm will list the configmaps you have added as reference

historus commented 5 years ago

Hi, sorry for the delay.

apiVersion: v1
data:
  application.yaml: |-
    spring:
      cloud:
        dataflow:
          task:
            platform:
              kubernetes:
                accounts:
                  default:
                    limits:
                      memory: 8192Mi
      datasource:
        url: jdbc:mysql://${MYSQL_SERVICE_HOST}:${MYSQL_SERVICE_PORT}/mysql
        username: root
        password: ${mysql-root-password}
        driverClassName: org.mariadb.jdbc.Driver
        testOnBorrow: true
        validationQuery: "SELECT 1"
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"application.yaml":"spring:\n  cloud:\n    dataflow:\n      task:\n        platform:\n          kubernetes:\n            accounts:\n              default:\n                limits:\n                  memory: 8192Mi\n  datasource:\n    url: jdbc:mysql://${MYSQL_SERVICE_HOST}:${MYSQL_SERVICE_PORT}/mysql\n    username: root\n    password: ${mysql-root-password}\n    driverClassName: org.mariadb.jdbc.Driver\n    testOnBorrow: true\n    validationQuery: \"SELECT 1\""},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"scdf-server"},"name":"scdf-server","namespace":"ns-team-cpx-pgc-stage-va6"}}
  creationTimestamp: "2019-08-15T15:08:32Z"
  labels:
    app: scdf-server
  name: scdf-server
  namespace: ns-team-cpx-pgc-stage-va6
  resourceVersion: "226056115"
  selfLink: /api/v1/namespaces/ns-team-cpx-pgc-stage-va6/configmaps/scdf-server
  uid: 8ba04093-bf6e-11e9-a319-0ea5aa27fe38

is what I get. From my perspective this looks good.

@sabbyanandan What do you mean exactly. From the logs I can see that the tomcat has been stopped because of the initialisation error.

2019-08-15 12:19:23.535 INFO 1 --- [ main] o.apache.catalina.core.StandardService : Stopping service [Tomcat]
2019-08-15 12:19:23.537 WARN 1 --- [ main] o.a.c.loader.WebappClassLoaderBase : The web application [ROOT] appears to have started a thread named [H2 TCP Server (tcp://172.16.65.196:19092)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
java.net.PlainSocketImpl.socketAccept(Native Method)
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
java.net.ServerSocket.implAccept(ServerSocket.java:545)
java.net.ServerSocket.accept(ServerSocket.java:513)
org.h2.server.TcpServer.listen(TcpServer.java:248)
org.h2.tools.Server.run(Server.java:610)
java.lang.Thread.run(Thread.java:748)
2019-08-15 12:19:23.563 ERROR 1 --- [ main] o.s.boot.SpringApplication : Application run failed

Also there is no service running on port 80 on the container.

chrisjs commented 5 years ago

using your cm sample above, i was able to start the server as expected. in your logs above i also see:

2019-08-15 12:19:06.707 INFO 1 --- [ main] b.c.PropertySourceBootstrapConfiguration : Located property source: CompositePropertySource {name='composite-configmap', propertySources=[ConfigMapPropertySource {name='configmap.scdf-server.ns-team-cpx-pgc-stage-va6'}]}

which would tell me its at least finding the cm.

historus commented 5 years ago

The diff (including removal of monitoring part according to the doc):

diff --git a/local/docker-compose.yml b/local/docker-compose.yml
new file mode 100644
index 00000000..7c189e6e
--- /dev/null
diff --git a/src/kubernetes/server/server-config.yaml b/src/kubernetes/server/server-config.yaml
index d9beb7e3..f35eb4f4 100644
--- a/src/kubernetes/server/server-config.yaml
+++ b/src/kubernetes/server/server-config.yaml
@@ -9,24 +9,6 @@ data:
     spring:
       cloud:
         dataflow:
-          applicationProperties:
-            stream:
-              management:
-                metrics:
-                  export:
-                    prometheus:
-                      enabled: true
-                endpoints:
-                  web:
-                    exposure:
-                      include: 'prometheus,info,health'
-              spring:
-                cloud:
-                  streamapp:
-                    security:
-                      enabled: false
-          grafana-info:
-            url: 'https://grafana:3000'
           task:
             platform:
               kubernetes:
diff --git a/src/kubernetes/server/server-deployment.yaml b/src/kubernetes/server/server-deployment.yaml
index 00002d46..80c03167 100644
--- a/src/kubernetes/server/server-deployment.yaml
+++ b/src/kubernetes/server/server-deployment.yaml
@@ -16,17 +16,17 @@ spec:
     spec:
       containers:
       - name: scdf-server
-        image: springcloud/spring-cloud-dataflow-server:2.3.0.BUILD-SNAPSHOT
+        image: springcloud/spring-cloud-dataflow-server:2.2.0.RELEASE
         imagePullPolicy: Always
         ports:
         - containerPort: 80
         resources:
           limits:
-            cpu: 1.0
-            memory: 2048Mi
+            cpu: 32.0
+            memory: 262144Mi
           requests:
-            cpu: 0.5
-            memory: 1024Mi
+            cpu: 4.0
+            memory: 16344Mi
         env:
         - name: KUBERNETES_NAMESPACE
           valueFrom:
@@ -50,7 +50,7 @@ spec:
           value: 'http://${SCDF_SERVER_SERVICE_HOST}:${SCDF_SERVER_SERVICE_PORT}'
           # Provide the Skipper service location
         - name: SPRING_CLOUD_SKIPPER_CLIENT_SERVER_URI
-          value: 'http://${SKIPPER_SERVICE_HOST}:${SKIPPER_SERVICE_PORT}/api'
+          value: 'https://skipper.domain.net/api'
           # Add Maven repo for metadata artifact resolution for all stream apps
         - name: SPRING_APPLICATION_JSON
           value: "{ \"maven\": { \"local-repository\": null, \"remote-repositories\": { \"repo1\": { \"url\": \"https://repo.spring.io/libs-snapshot\"} } } }"
diff --git a/src/kubernetes/server/server-ingress.yaml b/src/kubernetes/server/server-ingress.yaml
new file mode 100644
index 00000000..3dc55132
--- /dev/null
+++ b/src/kubernetes/server/server-ingress.yaml
@@ -0,0 +1,20 @@
+apiVersion: contour.heptio.com/v1beta1
+kind: IngressRoute
+metadata:
+  name: scdf-server
+  namespace: ns-team-cpx-pgc-stage-va6
+  annotations:
+    kubernetes.io/ingress.class: "contour-public"
+  labels:
+    app: skipper
+spec:
+  virtualhost:
+    fqdn: scdf.domain.net
+    tls:
+      secretName: heptio-contour/cluster-ssl-public
+  routes:
+  - match: /
+    services:
+    - name: scdf-server
+      port: 80
+      permitInsecure: true
diff --git a/src/kubernetes/server/server-svc.yaml b/src/kubernetes/server/server-svc.yaml
index 5ea223ba..324ac227 100644
--- a/src/kubernetes/server/server-svc.yaml
+++ b/src/kubernetes/server/server-svc.yaml
@@ -5,8 +5,6 @@ metadata:
   labels:
     app: scdf-server
 spec:
-  # If you are running k8s on a local dev box or using minikube, you can use type NodePort instead
-  type: LoadBalancer
   ports:
     - port: 80
       name: scdf-server
diff --git a/src/kubernetes/skipper/skipper-ingress.yaml b/src/kubernetes/skipper/skipper-ingress.yaml
new file mode 100644
index 00000000..60d1d4ac
--- /dev/null
+++ b/src/kubernetes/skipper/skipper-ingress.yaml
@@ -0,0 +1,20 @@
+apiVersion: contour.heptio.com/v1beta1
+kind: IngressRoute
+metadata:
+  name: skipper
+  namespace: ns-team-cpx-pgc-stage-va6
+  annotations:
+    kubernetes.io/ingress.class: "contour-public"
+  labels:
+    app: skipper
+spec:
+  virtualhost:
+    fqdn: skipper.domain.net
+    tls:
+      secretName: heptio-contour/cluster-ssl-public
+  routes:
+  - match: /
+    services:
+    - name: skipper
+      port: 80
+      permitInsecure: true
diff --git a/src/kubernetes/skipper/skipper-svc.yaml b/src/kubernetes/skipper/skipper-svc.yaml
index 18720023..18287fe0 100644
--- a/src/kubernetes/skipper/skipper-svc.yaml
+++ b/src/kubernetes/skipper/skipper-svc.yaml
@@ -5,8 +5,6 @@ metadata:
   labels:
     app: skipper
 spec:
-  # If you are running k8s on a local dev box, using minikube, or Kubernetes on docker desktop you can use type NodePort instead
-  type: LoadBalancer
   ports:
   - port: 80
     targetPort: 7577

Unfortunately I cannot apply the template as-is because of

Error from server (Forbidden): error when creating "src/kubernetes/server/server-svc.yaml": services "scdf-server" is forbidden: exceeded quota: rq-team-cpx-pgc-stage-va6, requested: services.loadbalancers=1,services.nodeports=1, used: services.loadbalancers=0,services.nodeports=0, limited: services.loadbalancers=0,services.nodeports=0
chrisjs commented 5 years ago

thanks - i will give your patch a try

in regards to server-svc.yaml - that would be fine to leave out for testing purposes for now, the main thing is getting the server itself to come up without an exception

chrisjs commented 5 years ago

ive applied your patch and still able to get the server to load as expected:

2019-08-20 13:49:39.233  INFO 1 --- [           main] o.s.c.d.s.s.DataFlowServerApplication    : Started DataFlowServerApplication in 38.486 seconds (JVM running for 41.216)
2019-08-20 13:49:39.313  INFO 1 --- [           main] .s.c.d.s.s.LauncherInitializationService : Added 'Kubernetes' platform account 'default' into Task Launcher repository.

only thing i changed was reverting the CPU and memory as I don't have that many resources on my laptop. i also deployed to minikube.

historus commented 5 years ago

Same for me. I tried docker for mac with built-in K8s today and it is starting the server. How can I make sure the ConfigMap can be read properly. I can access the ConfigMap through kubectl but it seems the SchedulerConfiguration doesn't get the taskPlatform set through the ConfigMap properly.

chrisjs commented 5 years ago

so when applying the patch in https://github.com/spring-cloud/spring-cloud-dataflow/issues/3439#issuecomment-522667979 and making no other changes, deploying to docker for mac k8s, the server starts fine but with those same exact changes, deploying to your custom k8s cluster on AWS fails?

i have not been able to successfully reproduce the failure yet.

in regards to seeing if the configmap can be read properly or not, i did some digging on that and found while a message like this:

2019-08-15 12:19:06.707 INFO 1 --- [ main] b.c.PropertySourceBootstrapConfiguration : Located property source: CompositePropertySource {name='composite-configmap', propertySources=[ConfigMapPropertySource {name='configmap.scdf-server.ns-team-cpx-pgc-stage-va6'}]}

may appear in the logs, that doesn't actually mean the config map itself was able to be obtained. as a test i changed the value of SPRING_CLOUD_KUBERNETES_CONFIG_NAME to be scdf-server-invalid, got the same log message (but with my value) and no errors about it not being able to be loaded.

looking further, i debugged into:

https://github.com/spring-cloud/spring-cloud-kubernetes/blob/v1.0.2.RELEASE/spring-cloud-kubernetes-config/src/main/java/org/springframework/cloud/kubernetes/config/ConfigMapPropertySource.java#L93

and as you can see the code attempts to lookup a CM in two places, add it to a results map, but trying to fetch a CM that doesn't exist it does not result in an exception, just a null checking and no logging is done to indicate if the CM was not found. spring-cloud-kubernetes loads the CM values into boot properties so to answer your question about how you can make sure its read properly, debugging the above was the only way i could see. inspecting KubernetesClient object while debugging may yield more info as to what it can see, etc.

so.. in an effort to reproduce this, i have, against minikube:

is there anything else i may need to do to reproduce an environment where you are seeing the failure? is there anything special about the way its setup, etc?

chrisjs commented 5 years ago

closing for now as i cannot reproduce. if there is still an issue, please re-open with steps to reproduce.

chrisjs commented 5 years ago

@historus i think i may have run into a similar situation where i can finally reproduce this and have a possible solution (worked for me at least). can you add the following to your scdf and skipper deployment env blocks:

        - name: KUBERNETES_TRUST_CERTIFICATES
          value: 'true'

please let me know the results

historus commented 5 years ago

Unfortunately this doesn't help. I have seen the same failure on a local installation because there was a white space character after the environmentVariables key. After removing it the server started successfully. But this wasn't the case on the k8s cluster where I am still facing this issue. Was wondering whether the yaml parser is very restrictive.

chrisjs commented 5 years ago

doh.. was the first time i was able to reproduce so was hoping we had somewhat similar env with that requirement.

i would suspect the next place to start debugging would be in for example:

see if the CM is getting loaded:

https://github.com/spring-cloud/spring-cloud-kubernetes/blob/v1.0.2.RELEASE/spring-cloud-kubernetes-config/src/main/java/org/springframework/cloud/kubernetes/config/ConfigMapPropertySource.java#L93

https://github.com/spring-cloud/spring-cloud-kubernetes/blob/v1.0.2.RELEASE/spring-cloud-kubernetes-config/src/main/java/org/springframework/cloud/kubernetes/config/ConfigMapPropertySource.java#L106

if it does get loaded, see whats going on with the processing:

https://github.com/spring-cloud/spring-cloud-kubernetes/blob/v1.0.2.RELEASE/spring-cloud-kubernetes-config/src/main/java/org/springframework/cloud/kubernetes/config/ConfigMapPropertySource.java#L133

historus commented 4 years ago

One thing I just realized is that it does not used the configured (and deployed) mysql but starts H2. Something seems to be broken with applying or using the configuration. The whole stuff is not deployed in the default namespace but a custom one. Could that be a reason?

chrisjs commented 4 years ago

if you are deploying things across namespaces, you would have to change the configs to reflect so. the stock configs assume everything is in the same place. https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ is a good reference to start with to tailor to your specific env

historus commented 4 years ago

It is all deployed in the same namespace, but it is not the default one.