spring-cloud / spring-cloud-dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
https://dataflow.spring.io
Apache License 2.0
1.1k stars 578 forks source link

[task-relaunch] Add ability to define default deployment properties for an App #3962

Open philippn opened 4 years ago

philippn commented 4 years ago

Hi there,

I have been playing around with SCDF quite a bit lately and like it very much, so thanks for all your hard work!

There is one feature that I'm missing though: I'm using SCDF with Dockerized task apps. Some of those need Persistent volume claims and other complicated properties like that. Afaik the only way to specify this stuff is when launching the task app.

Having to pass this everytime the app is launched is kind of error prone. Also, since these properties are pretty low level, I think it would make sense to keep this out of the users/clients way as much as possible. My suggestion/wish would be to add some way to add such deployment properties when the app is registered.

Thanks for your time and kind regards, Philipp

ilayaperumalg commented 4 years ago

Hi @philippn , Thanks for trying out SCDF and your input.

The feature you mention above is something we'd like to showcase as a recipe in our SCDF site: https://github.com/spring-io/dataflow.spring.io/issues/247.

We'll keep you posted on our attempt to set this up.

sabbyanandan commented 4 years ago

@philippn: It is indeed great to see this feedback—thank you for the support! Apart from what Ilaya pointed out, you could also define platform accounts as SCDF configuration. In the desired platform account, you could plug the volume claims as to the global property that can be used at every launch.

So, when launching the tasks, you'd select the platform (with that extra configuration) via the optional --platformName property.

philippn commented 4 years ago

@ilayaperumalg I'm looking forward to your showcase, thanks in advance!

@sabbyanandan That is very interesting, thank you! It definitely comes in handy when deploying all of it on different cloud providers.

My specific use case is really geared towards specific apps though. For example, I have a task task app that needs a certain PVC, while others do not and so on. This is not limited to deployer-properties, but as far as I understand for the plain "application.properties" you can already utilize the metadata jar approach and then configure them per app/task. Something similar for deployer properties would be quite useful.

For my particular use-case, its not a big issue though, because the tasks are launched via the Rest API from an external system. Basically I just need to make sure that this system manages these properties for now.

So I just thought this might be useful for maybe more people. Thanks to both of you for your time and keep up the good work!

guoyiang commented 3 years ago

I recently run into this issue as well. I deploy dataflow server into a k8s cluster using helm chart. There're some properties in task which needs override (e.g. internal storage server address)

From what I understand, there're two ways to pass in configurations for a task:

  1. In platform's default setting inside dataflow server's application.yml, which will apply to all tasks.
  2. As deployer property when launch a task. This applies to the task execution, but needs to provide those parameters each time running a task.
  3. Create one platform for each tasks.

But there're limitations in either approaches.

Wondering if it's possible to have a more flexible way to configure tasks. Here's some thoughts:

  1. Provide default properties ofan application when registering with spring cloud dataflow server. By this way, the configuration is done at the time of registration / updating, and it separates dataflow server configuration from task configuration.
  2. Be able to provide per-app confiugration in dataflow server's configuration. It is less ideal comparing to 1, but maybe it's easier to implement in current framework.

The main idea about this is to separate the concern between task, dataflow server, and downstream services who triggers a task. Tasks takes care of its own configuration and registration, dataflow server takes care of platform setting, and downstream caller only cares about data passed into task.

Thanks!

cppwfs commented 3 years ago

Hello @guoyiang, Thank you for your feedback.
Let's explore this topic a bit more. When you say task, do you mean the Task App or the Task Definition?

For your first point, "From what I understand, there're two ways to pass in configurations for a task:"

For your second point, "As deployer property when launch a task. This applies to the task execution, but needs to provide those parameters each time running a task."

guoyiang commented 3 years ago

@cppwfs Thank you a lot for the info. Great hints and I understand something more.

I mixed Task App and Task Definition together, because we have a simple task definition with only one task at the moment. But the end goal is to pass some configuration when running a task app, which is triggered through a definition.

I played a bit with approaches you mentioned:

It would be great if there's a formal way to configure deployer properties of a task app or task definition, instead of implicit inheritance from last run, so we can set this properties without need to run the task or check for last run. I think it will be covered by what you already mentioned, to allow set deployer properties during task definition or app registration. If this function can be added, ideally along with capability to update a task definition, I think it will give a lot of flexibility in configuring task app/definition. Looking forward to the news!

spring cloud config server is also another approach we'll evaluate, but probably over a longer term because the complexity from deploying one more component.

Thanks again!

guoyiang commented 3 years ago

Noticed there's a similar issue #2194

cppwfs commented 3 years ago

@guoyiang I had a discussion with the team on this topic. So we created the following issue to handle what you have discussed. https://github.com/spring-cloud/spring-cloud-dataflow/issues/4423 This should address your request. Thank you for providing excellent feedback.

taxone commented 3 years ago

Hello @guoyiang, Thank you for your feedback. Let's explore this topic a bit more. When you say task, do you mean the Task App or the Task Definition?

For your first point, "From what I understand, there're two ways to pass in configurations for a task:"

  • You can set application properties for a task at task definition creation time. i.e.timestamp --timestamp.format=YYYY
  • For deployer properties, I'll raise an issue to see if we can add deployer properties (Like app properties) to a task definition or at app registration during our next standup.

For your second point, "As deployer property when launch a task. This applies to the task execution, but needs to provide those parameters each time running a task."

So, do you mean that I can use configMaps for deployer properties as well, not only for application properties?

How can I do that?

It's possible to pass application properties to a task launching it with the deployment property below:

deployer.mytask.kubernetes.config-map-refs=myconfigmap

But specifying the usage of configMap can be done only (as far as I know) through a deployer property passed to the task launcher... so, how can I define the configMap which contains the deployer properties for a specific task?

Thanks in advance

aritzbastida commented 2 years ago

Hello!! We are trying out SCDF in our project, and were also surprised about these configuration nuances. I think that this sentence by @guoyiang synthetizes the problem:

The main idea about this is to separate the concern between task, dataflow server, and downstream services who triggers a task.

The current design of SCDF forces to specify at launch time three different kinds of information:

The first two should be abstracted away from the team responsible for launching the tasks. In other words, this team should only bother about the what (task to launch, plus its business parameters), not the how (technical details such as the entrypoint style to start up the container in Kubernetes).

Moreover, the successive task executions should not be dependent on each other, which makes guessing about the "current" deployment configuration quite cumbersome. Launch executions should be deterministic. I guess that this solution was implemented in order to avoid passing deployment parameters over and over (such as the entrypoint style in the example above).

Please, let me give my two cents and share how we just solved it in our local SCDF instance. We patched DefaultTaskExecutionService as follows:

public long executeTask(String taskName, Map<String, String> taskDeploymentProperties, List<String> commandLineArgs) {
  ...
  Map<String, String> launchProperties = new HashMap<>();
  launchProperties.putAll(taskConfigurationProperties.getProperties());
  launchProperties.putAll(taskDeploymentProperties);

  taskExecutionInformation.setTaskDeploymentProperties(launchProperties);

// Finally create App deployment request
  AppDeploymentRequest request = this.taskAppDeploymentRequestCreator.createRequest(taskExecution,
                taskExecutionInformation, commandLineArgs, platformName, launcher.getType());

  TaskManifest taskManifest = createTaskManifest(platformName, request, launchProperties);

We added a new "properties" attribute to TaskConfigurationProperties class, which lets us specify task-specific properties (both "app" and "deployer"). If additional properties are provided at launch-time (e.g. "properties" param in /task/executions), these take precedence. The manifest is persisted with the effective configuration which will be used to launch the task, but will not be used to calculate configuration for successive ones.

Global properties and job-specific properties can be configured in any property source from the Spring environment, such as config map, Cloud Config or Zookeeper.

# GLOBAL APPLICATION PROPERTIES 
spring.cloud.dataflow.applicationProperties.task:        
   spring.config.import: configserver:localhost:9393
   sprin.main.banner-mode: log             

# GLOBAL DEPLOYER PROPERTIES    
spring.cloud.dataflow.task.platform.kubernetes.accounts.default:
   createJob: false 
   entryPointStyle: shell

# TASK-SPECIFIC APPLICATION/DEPLOYER PROPERTIES
spring.cloud.dataflow.task.properties: 
   deployer.my-task.kubernetes.limits.cpu: 270m
   app.my-task.spring.main.banner-mode: off 

All in all, task configuration and launch are now separated, and, as a result, can be addressed by different teams.

tcgtam commented 1 year ago

I need to use a single docker image, containing an Java SpringBoot application, to launch different business level tasks by feeding in different java properties at runtime. Say, I would like to define 40 Applications at SCDF, having all of them using the same docker image. These 40 Applications would be the artifacts involved in composed tasks definition as well.

As a result, I need to define application properties at the time I create Applications. Will this requirement be addressed soon?