streamshub / flink-sql-examples

Apache License 2.0
1 stars 6 forks source link

Flink SQL Examples incompatible with Flink Operator 1.10 - Invalid value: "/opt/flink/artifacts": must be unique #21

Open k-wall opened 2 days ago

k-wall commented 2 days ago

The examples currently use of Flink Operator 1.9.

If you try to upgrade to Flink Operator 1.10, the FlinkDeployment fails to become ready. The error message in the operator logs in this:

Caused by: org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: [https://172.30.0.1:443/apis/apps/v1/namespaces/flink-filter/deployments](https://172.30.0.1/apis/apps/v1/namespaces/flink-filter/deployments). Message: Deployment.apps "flink-filter" is invalid: spec.template.spec.containers[0].volumeMounts[2].mountPath: Invalid value: "/opt/flink/artifacts": must be unique. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.template.spec.containers[0].volumeMounts[2].mountPath, message=Invalid value: "/opt/flink/artifacts": must be unique, reason=FieldValueInvalid, additionalProperties={})], group=apps, kind=Deployment, name=flink-filter, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "flink-filter" is invalid: spec.template.spec.containers[0].volumeMounts[2].mountPath: Invalid value: "/opt/flink/artifacts": must be unique, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).

The issue is that the example FlinkDeployments are providing volume mounted at /opt/flink/artifacts.

https://github.com/streamshub/flink-sql-examples/blob/98bff2cd37c07d5cbb5e1a664e75238cca3515ee/recommendation-app/flink-deployment.yaml#L30

This duplicates the volume provided by Flink itself. The change was introduced in Flink 1.19 by https://issues.apache.org/jira/browse/FLINK-28915.

https://github.com/apache/flink/blob/e63aa12252843d0098a56f3091b28d48aff5b5af/flink-kubernetes/src/main/java/org/apache/flink/kubernetes/kubeclient/decorators/InitJobManagerDecorator.java#L62

Flink Operator 1.9 used Flink 1.18, so hence the need to the workaround in the FlinkDeployment.

Once the example is upgraded to Flink Operator 1.10, the workaround should be removed.

I don't think there is need to preserve backward compatibility with older versions of the Flink Operator.

k-wall commented 2 days ago

I also question whether we need the /opt/flink/log mount at all? Log4J is already configured to write to the console, which is consistent with the requirements of a 12 factor app. Logging to a file within the pod is pointless. No one will expect it to be there and it'll just eat memory.