lightbend / cloudflow

Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
https://cloudflow.io
Apache License 2.0
321 stars 89 forks source link

Streamlet configuration is wiped out periodically #1045

Open vkorenev opened 3 years ago

vkorenev commented 3 years ago

Describe the bug

Sometimes a streamlet that was working fine fails after the pod restarts. Further investigation discovered that all the configuration properties disappear from the secret that contains the config.

Example: A streamlet was working fine for some time. Then, after a pod was restarted by Kubernetes, it failed to start with the following exception:

Exception in thread "main" com.typesafe.config.ConfigException$Missing: /etc/cloudflow-runner-secret/secret.conf: 1: No configuration setting found for key 'store-info-config-path'
    at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:156)
    at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:174)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:188)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:193)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:250)
    at cloudflow.streamlets.StringConfigParameter.value(ConfigParameters.scala:156)
    at com.hbc.streams.stores.StoresStreamlet$$anon$1.runnableGraph(StoresStreamlet.scala:45)
    at cloudflow.akkastream.scaladsl.RunnableGraphStreamletLogic.run(RunnableGraphStreamletLogic.scala:34)
    at cloudflow.akkastream.AkkaStreamlet.run(AkkaStreamlet.scala:96)
    at cloudflow.akkastream.AkkaStreamlet.run(AkkaStreamlet.scala:37)
    at cloudflow.streamlets.Streamlet.run(Streamlet.scala:106)
    at cloudflow.runner.Runner$.run(Runner.scala:68)
    at cloudflow.runner.Runner$.main(Runner.scala:46)
    at cloudflow.runner.Runner.main(Runner.scala)

Listing the configuration for this application shows that this property is present:

❯ kubectl cloudflow configuration mw-bay-store-info-app-qa
+-----------------------------------------------------------------------------------------------------------------+------------------------------------------------------+
| KEY                                                                                                             | VALUE                                                |
+-----------------------------------------------------------------------------------------------------------------+------------------------------------------------------+
| cloudflow.runtimes.akka.config.akka.loglevel                                                                    | DEBUG                                                |
| cloudflow.runtimes.akka.kubernetes.pods.pod.containers.container.env                                            |                                                      |
| cloudflow.runtimes.akka.kubernetes.pods.pod.containers.container.ports                                          |                                                      |
| cloudflow.runtimes.akka.kubernetes.pods.pod.containers.container.volume-mounts.logging--962335799.mount-path    | /opt/logging                                         |
| cloudflow.runtimes.akka.kubernetes.pods.pod.containers.container.volume-mounts.logging--962335799.read-only     | true                                                 |
| cloudflow.runtimes.akka.kubernetes.pods.pod.containers.container.volume-mounts.logging--962335799.subPath       |                                                      |
| cloudflow.runtimes.akka.kubernetes.pods.pod.volumes.logging--962335799.secret.name                              | logging                                              |
| cloudflow.streamlets.stores.config-parameters.kafka-bootstrap-servers                                           | kafka-cluster-kafka-bootstrap.kafka:9092             |
| cloudflow.streamlets.stores.config-parameters.kafka-schema-registry-url                                         | http://schema-registry-cp-schema-registry.kafka:8081 |
| cloudflow.streamlets.stores.config-parameters.kafka-topic                                                       | bay.stores.v1                                        |
| cloudflow.streamlets.stores.config-parameters.poll-interval                                                     | 1h                                                   |
| cloudflow.streamlets.stores.config-parameters.store-info-config-path                                            | /mnt/store-info-config                               |
| cloudflow.streamlets.stores.kubernetes.pods.pod.containers.container.resources.requests.cpu                     | 100m                                                 |
| cloudflow.streamlets.stores.kubernetes.pods.pod.containers.container.resources.requests.memory                  | 128Mi                                                |
| cloudflow.streamlets.stores.kubernetes.pods.pod.containers.container.volume-mounts.store-info-config.mount-path | /mnt/store-info-config                               |
| cloudflow.streamlets.stores.kubernetes.pods.pod.containers.container.volume-mounts.store-info-config.read-only  | true                                                 |
| cloudflow.streamlets.stores.kubernetes.pods.pod.volumes.store-info-config.secret.name                           | store-info-config                                    |
+-----------------------------------------------------------------------------------------------------------------+------------------------------------------------------+

However, all the configuration properties disappeared from the secret which is mounted to /etc/cloudflow-runner-secret:

❯ kubectl -n mw-bay-store-info-app-qa get secret stores -o yaml
apiVersion: v1
data:
  pods-config.conf: e30=
  runtime-config.conf: e30=
  secret.conf: eyJjbG91ZGZsb3ciOnsia2Fma2EiOnsiYm9vdHN0cmFwLXNlcnZlcnMiOiJrYWZrYS1jbHVzdGVyLWthZmthLWJvb3RzdHJhcC5rYWZrYS5zdmMuY2x1c3Rlci5sb2NhbDo5MDkyIn0sInN0cmVhbWxldHMiOnsic3RvcmVzIjp7fX19fQ==
kind: Secret
metadata:
  creationTimestamp: "2021-05-04T00:36:34Z"
  labels:
    app.kubernetes.io/managed-by: cloudflow
    app.kubernetes.io/part-of: mw-bay-store-info-app-qa
    app.kubernetes.io/version: qa-0.1.1-6-a7ce2b66
    com.lightbend.cloudflow/app-id: mw-bay-store-info-app-qa
    com.lightbend.cloudflow/config-format: config
    com.lightbend.cloudflow/streamlet-name: stores
  name: stores
  namespace: mw-bay-store-info-app-qa
  ownerReferences:
  - apiVersion: cloudflow.lightbend.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: CloudflowApplication
    name: mw-bay-store-info-app-qa
    uid: 633f9eb7-e884-4268-878b-edf3ca87f264
  resourceVersion: "97546028"
  selfLink: /api/v1/namespaces/mw-bay-store-info-app-qa/secrets/stores
  uid: 0515f6a8-cef6-4825-86dc-323ca1d173ec
type: Opaque

When decoded, the properties are these:

pods-config.conf: {}
runtime-config.conf: {}
secret.conf: {"cloudflow":{"kafka":{"bootstrap-servers":"kafka-cluster-kafka-bootstrap.kafka.svc.cluster.local:9092"},"streamlets":{"stores":{}}}}

But when this application is freshly deployed, the values are present there:

pods-config.conf: {"kubernetes":{"pods":{"pod":{"annotations":{},"containers":{"container":{"env":[],"ports":[],"resources":{"limits":{},"requests":{"cpu":"100m","memory":"128Mi"}},"volume-mounts":{"logging--962335799":{"mount-path":"/opt/logging","read-only":true,"subPath":""},"store-info-config":{"mount-path":"/mnt/store-info-config","read-only":true}}}},"labels":{},"volumes":{"logging--962335799":{"secret":{"name":"logging"}},"store-info-config":{"secret":{"name":"store-info-config"}}}}}}}
runtime-config.conf: {"akka":{"loglevel":"DEBUG"}}
secret.conf: {"akka":{"loglevel":"DEBUG"},"cloudflow":{"kafka":{"bootstrap-servers":"kafka-cluster-kafka-bootstrap.kafka.svc.cluster.local:9092"},"streamlets":{"stores":{"kafka-bootstrap-servers":"kafka-cluster-kafka-bootstrap.kafka:9092","kafka-schema-registry-url":"http://schema-registry-cp-schema-registry.kafka:8081","kafka-topic":"bay.stores.v1","poll-interval":"1h","store-info-config-path":"/mnt/store-info-config"}}},"kubernetes":{"pods":{"pod":{"annotations":{},"containers":{"container":{"env":[],"ports":[],"resources":{"limits":{},"requests":{"cpu":"100m","memory":"128Mi"}},"volume-mounts":{"logging--962335799":{"mount-path":"/opt/logging","read-only":true,"subPath":""},"store-info-config":{"mount-path":"/mnt/store-info-config","read-only":true}}}},"labels":{},"volumes":{"logging--962335799":{"secret":{"name":"logging"}},"store-info-config":{"secret":{"name":"store-info-config"}}}}}}}

So at some point all the config values except cloudflow.kafka.bootstrap-servers are wiped out.

To Reproduce

I have not found exact steps to reproduce this yet. But this happens periodically every 1-2 weeks.

Expected behavior

Streamlets should continue to run indefinitely with the last applied configuration.

Additional context

The deployed Cloudflow version is 2.0.21.

andreaTP commented 3 years ago

Hi @vkorenev , reading your report puzzles me a little. Can you please check and confirm the Cloudflow versions of the components you are using:

vkorenev commented 3 years ago

Hi @andreaTP, Cloudflow operator and sbt plugin are 2.0.21. However, the CLI version is 2.0.23. Should I downgrade the CLI?

andreaTP commented 3 years ago

I would suggest you to align and upgrade the versions to latest ( 2.0.25 ) and let us know if the problem persists