Open DarthKrab opened 4 years ago
Hi @DarthKrab Cloudflow 2.0 has significantly better support for what you are trying to do: https://cloudflow.io/docs/current/develop/cloudflow-configuration.html
You can now pass resources requirements in a config file using --conf file
in kubectl cloudflow deploy
and kubectl cloudflow configure
. The configuration can apply per specific streamlet or for all streamlets of a runtime (Flink for example).
Using the new configuration model, you can configure the following settings:
Is it possible for you to try v2.0.5? There is a migration guide here: https://cloudflow.io/docs/current/project-info/migration-1_3-2_0.html
Hi @RayRoestenburg thanks for your reply.
We have to make a critical decision considering a migration to 2.0.5 and possible preliminary experiments and hypothesis checks on it, so it is extremely important for us to make sure that the new configuration mechanics works as designed, especially for any Flink-related resources. Do you aware if anybody successfully tried to tune Flink's memory parameters using Cloudflow 2.0.5 or it's better to check this thoroughly before the migration?
We have tested that changing the memory requirements works for Flink streamlets.
On Mon, Jul 6, 2020 at 1:22 PM Alex Sergeenko notifications@github.com wrote:
Hi @RayRoestenburg https://github.com/RayRoestenburg thanks for your reply.
We have to make a critical decision considering a migration to 2.0.5 and possible preliminary experiments and hypothesis checks on it, so it is extremely important for us to make sure that the new configuration mechanics works as designed, especially for any Flink-related resources. Do you aware if anybody successfully tried to tune Flink's memory parameters using Cloudflow 2.0.5 or it's better to check this thoroughly before the migration?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lightbend/cloudflow/issues/561#issuecomment-654173767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGGCMXTBTDLFEB4FUN2PTR2GXXBANCNFSM4ORLL7IQ .
-- Cloudflow Tech Lead, Lightbend, Inc. ray@lightbend.com raymond@lightbend.com @RayRoestenburg https://twitter.com/RayRoestenburg
We have tested that changing the memory requirements works for Flink streamlets. … On Mon, Jul 6, 2020 at 1:22 PM Alex Sergeenko @.**> wrote: Hi @RayRoestenburg https://github.com/RayRoestenburg thanks for your reply. We have to make a critical decision considering a migration to 2.0.5 and possible preliminary experiments and hypothesis checks on it, so it is extremely important for us to make sure that the new configuration mechanics works as designed, especially for any Flink-related resources. Do you aware if anybody successfully tried to tune Flink's memory parameters using Cloudflow 2.0.5 or it's better to check this thoroughly before the migration? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#561 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGGCMXTBTDLFEB4FUN2PTR2GXXBANCNFSM4ORLL7IQ . -- Cloudflow Tech Lead, Lightbend, Inc.* ray@lightbend.com raymond@lightbend.com @RayRoestenburg https://twitter.com/RayRoestenburg https://www.lightbend.com/
Hi @RayRoestenburg
Great thanks for the assistance!
You’re welcome!
On Mon, 13 Jul 2020 at 08:06, Alex Sergeenko notifications@github.com wrote:
We have tested that changing the memory requirements works for Flink streamlets. … <#m-2827871021952755181> On Mon, Jul 6, 2020 at 1:22 PM Alex Sergeenko @.**> wrote: Hi @RayRoestenburg https://github.com/RayRoestenburg https://github.com/RayRoestenburg thanks for your reply. We have to make a critical decision considering a migration to 2.0.5 and possible preliminary experiments and hypothesis checks on it, so it is extremely important for us to make sure that the new configuration mechanics works as designed, especially for any Flink-related resources. Do you aware if anybody successfully tried to tune Flink's memory parameters using Cloudflow 2.0.5 or it's better to check this thoroughly before the migration? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#561 (comment) https://github.com/lightbend/cloudflow/issues/561#issuecomment-654173767>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGGCMXTBTDLFEB4FUN2PTR2GXXBANCNFSM4ORLL7IQ . -- Cloudflow Tech Lead, Lightbend, Inc.* ray@lightbend.com raymond@lightbend.com @RayRoestenburg https://github.com/RayRoestenburg https://twitter.com/RayRoestenburg https://www.lightbend.com/
Hi @RayRoestenburg https://github.com/RayRoestenburg
Great thanks for the assistance!
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/lightbend/cloudflow/issues/561#issuecomment-657378106, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGGCK3YO3UFLPFCRPB7BLR3KP5JANCNFSM4ORLL7IQ .
-- Cloudflow Tech Lead, Lightbend, Inc. ray@lightbend.com raymond@lightbend.com @RayRoestenburg https://twitter.com/RayRoestenburg
Hi! Unfortunately, this problem exists in the new version of cloudflow (2.0.5). Pods of the new version of pipeline are created, live for 5 minutes and then deleted. Log of the Flink operator below: not_update.log
There is no information about these events in the cloudflow operator log.
Immediately after installing the new version of the platform, the pipeline update work correctly. But as the number of pipelines in the cluster increases, the problem reappears.
There are no problems with resources in the cluster (cpu, ram). I undeploy all the pipelines (using kubectl-cloudflow undeploy ) and install them again. The problem remains.
One more thing but I'm not sure it's relevant. During installation of the pipeline, we made an error in the name of the topic kafka. After deploying the pipeline, cloudflow-operator failed with the error below: err_1.log
I successfully re-created the pod with cloudflow-operator. After that, we noticed the problem for the first time.
Sorry maybe I should have been more clear in my response, did you use the new configuration feature in 2.0? Please see https://cloudflow.io/docs/current/develop/cloudflow-configuration.html https://cloudflow.io/docs/current/develop/cloudflow-configuration.html#_configuring_streamlets_using_the_streamlet_scope
On Mon, 20 Jul 2020 at 15:54, DarthKrab notifications@github.com wrote:
Hi! Unfortunately, this problem exists in the new version of cloudflow (2.0.5). Pods of the new version of pipeline are created, live for 5 minutes and then deleted. Log of the Flink operator below: not_update.log https://github.com/lightbend/cloudflow/files/4947705/not_update.log
There is no information about these events in the cloudflow operator log.
Immediately after installing the new version of the platform, the pipeline update work correctly. But as the number of pipelines in the cluster increases, the problem reappears.
There are no problems with resources in the cluster (cpu, ram). I undeploy all the pipelines (using kubectl-cloudflow undeploy ) and install them again. The problem remains.
One more thing but I'm not sure it's relevant. During installation of the pipeline, we made an error in the name of the topic kafka. After deploying the pipeline, cloudflow-operator failed with the error below: err_1.log https://github.com/lightbend/cloudflow/files/4948017/err_1.log
I successfully re-created the pod with cloudflow-operator. After that, we noticed the problem for the first time.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/lightbend/cloudflow/issues/561#issuecomment-661054478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGGCMY2NGMITN57UL2INLR4REBVANCNFSM4ORLL7IQ .
-- Cloudflow Tech Lead, Lightbend, Inc. ray@lightbend.com raymond@lightbend.com @RayRoestenburg https://twitter.com/RayRoestenburg
Тhanks for the answer! As far as I understand "kubectl-cloudflow configure" is used to update the pipeline configuration. But if I don't need to change the pipeline configuration, how do I correctly build the pipeline code update process?
I have a running pipeline ("kubectl-cloudflow deploy" is used for installation). Let's say the developer updated the streamlet code. My task is to update the current pipeline. I build the project and get the message:
[success] Use the following command to deploy the Cloudflow application: [success] kubectl cloudflow deploy /opt/home/build-dir/JOB/target/pipeline.json"
What is the right thing to do next?
You can use —conf with deploy as well
On Wed, 22 Jul 2020 at 13:37, DarthKrab notifications@github.com wrote:
Тhanks for the answer! As far as I understand "kubectl-cloudflow configure" is used to update the pipeline configuration. But if I don't need to change the pipeline configuration, how do I correctly build the pipeline code update process?
I have a running pipeline ("kubectl-cloudflow deploy" is used for installation). Let's say the developer updated the streamlet code. My task is to update the current pipeline. I build the project and get the message:
[success] Use the following command to deploy the Cloudflow application: [success] kubectl cloudflow deploy /opt/home/build-dir/JOB/target/pipeline.json"
What is the right thing to do next?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/lightbend/cloudflow/issues/561#issuecomment-662404006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGGCPYZTZDG3BMESG7ER3R43FQZANCNFSM4ORLL7IQ .
-- Cloudflow Tech Lead, Lightbend, Inc. ray@lightbend.com raymond@lightbend.com @RayRoestenburg https://twitter.com/RayRoestenburg
Great. I'm executing a command: kubectl cloudflow deploy /opt/home/build-dir/JOB/target/pipeline.json --conf pipeline.conf
After that, new streamlets are created in the pipeline project's namespace. In the cloudflow-flink-operator log I see: "msg":"Application resource has changed. Moving to Updating"
After 5 minutes, the new pods go to the Terminating status and the cluster deletes them. Old pods remain (podas of the old version of pipeline). In the cloudflow-flink-operator log I see: "msg":"Logged Warning event: ClusterCreationFailed: Flink cluster failed to become available: failed to make progress after 5m0s" "msg":"Logged Warning event: RolledBackDeploy: Successfully rolled back deploy f396fdc7"
The problem is this. Full log:
FYI We have also seen this occur, for some reason the Flink operator cannot always advance, and then it falls back to the previous Flink cluster. The 5 min can be explained by this setting: https://github.com/lyft/flinkk8soperator/blob/master/pkg/controller/config/config.go#L26
I have a problem changing the flink application in my pipeline. The problem is this: for example, I want to change the amount of memory for the task мanager in one of the job clusters. I change the default value in flink application by the spec.taskManagerConfig.resources.requests.memory. After that new deployments are created in namespace http://joxi.ru/GrqkRYvck3xEQ2 that duplicate existing ones. In the logs flink-operator : https://paste.ubuntu.com/p/Dfys67smSN/ Accordingly, the new streamlets are deleted and my changes are not applied.
However, for some job clusters, no new deployment is created when updating the flink application. Just create a new version of the current deployment http://joxi.ru/n2YowdZTZ3y88r . And in this case everything is updated correctly
I use Cloudflow 1.3.3, Openshift 3.11