Open thodson-usgs opened 7 months ago
Numbers are the tricky bit, but usually yamlencode just passes these through - so "12345"
gets encoded to "12345"
while 12345
gets encoded to 12345
. Do you have an example config to show?
The other option is to switch to jsonencode, because json is a subset of YAML and anything that accepts yaml should accept JSON (with the exception of the presence of hard tabs). This is actually generally what I do, because JSON isn't space significant while YAML is.
I believe I've verified this while testing restart strategies.
Viewing the job manager config in Flink's dashboard:
I pass the config through terraform and I get
"restart-strategy.type": "exponential-delay"
"restart-strategy.exponential-delay.max-backoff": "20 min"
and the restart fails.
Whereas if I pass the config through the recipe's config.py I get
restart-strategy.type: exponential-delay
restart-strategy.exponential-delay.max-backoff : 20 min
and the restart works.
If I pass both, both appear in the config, which may also indicate they are not equivalent.
I'll report more examples as I come across them, but I'm feeling confident that our terraform-flink-config is broken!
Also checking the job manager logs, all the quoted parameters appear red. Unclear what red indicates, but I would guess an error (color not shown).
INFO [] - Loading configuration property: blob.server.port, 6124
INFO [] - Loading configuration property: kubernetes.jobmanager.annotations, flinkdeployment.flink.apache.org/generation:2
INFO [] - Loading configuration property: kubernetes.jobmanager.replicas, 1
INFO [] - Loading configuration property: "kubernetes.operator.metrics.reporter.prom.port", "9999"
INFO [] - Loading configuration property: taskmanager.memory.task.off-heap.size, 256m
INFO [] - Loading configuration property: jobmanager.rpc.address, gh-2dhytest-2dfeedstocks-2dgpcp-2dfrom-408fbc.default
INFO [] - Loading configuration property: kubernetes.taskmanager.cpu, 1.0
INFO [] - Loading configuration property: "prometheus.io/port", "9999"
INFO [] - Loading configuration property: kubernetes.service-account, flink
INFO [] - Loading configuration property: kubernetes.cluster-id, gh-2dhytest-2dfeedstocks-2dgpcp-2dfrom-408fbc
INFO [] - Loading configuration property: "restart-strategy.exponential-delay.initial-backoff", "10 s"
INFO [] - Loading configuration property: kubernetes.internal.taskmanager.replicas, 5
INFO [] - Loading configuration property: taskmanager.memory.flink.size, 1536m
INFO [] - Loading configuration property: kubernetes.container.image, flink:1.16
INFO [] - Loading configuration property: parallelism.default, 1
INFO [] - Loading configuration property: kubernetes.namespace, default
INFO [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
INFO [] - Loading configuration property: taskmanager.memory.jvm-overhead.max, 1024m
INFO [] - Loading configuration property: kubernetes.rest-service.exposed.type, ClusterIP
I'm starting to suspect that the output of
yamlencode()
toflink-conf.yaml
is not being parsed correctly. When I look at the logs, values set myyamlencode()
appear as:"number": "12345"
whereas other values appear asnumber: 12345
In fact, both values can occur in the same deployment, which make me think they aren't equivalent.