treasure-data / digdag

Workload Automation System
https://www.digdag.io/
Apache License 2.0
1.3k stars 221 forks source link

[feature request] digdag push without registering schedules (or with draft revisions) #1062

Open sonots opened 5 years ago

sonots commented 5 years ago

From https://github.com/treasure-data/digdag/pull/1048#issuecomment-476484645

With #1048, I introduced digdag server --disable-scheduler which does not start any schedules registered. But, it is somewhat misleading because it allows registering schedules dig files with

schedule>:
   xxxx

and digdag schedules shows the registered schedules (although they will not be started).

It is probably better not to register schedules if digdag pushed into a server with --disable-scheduler.

muga commented 5 years ago

Hi @sonots, can we keep the current implementation and meaning of --disable-scheduler? The option itself is good to control and manage server resource like ScheduleExecutor is not running on digdag-server A but, running on digdag-server B. Instead you may want to ignore and not register schedule definitions on server side. I feel that it's different from the original --disable-scheduler option. What do you think?

sonots commented 5 years ago

Instead you may want to ignore and not register schedule definitions on server side. I feel that it's different from the original --disable-scheduler option. What do you think?

The internal behavior is different, but I can not imagine any scenarios that users want to differentiate them.

But, I can add another option like digdag server --ignore-schedules. How about this, then?

muga commented 5 years ago

@sonots Thank you for suggesting the option.

The internal behavior is different, but I can not imagine any scenarios that users want to differentiate them.

Ah, not with Digdag, our company has split the clusters of job workers and schedulers that trigger the workers. Job workers consume CPU, disk space, much memory on the cluster and sometimes make the cluster unstable. On the other hand, schedulers don't consume resource so much but, they should not be delay to trigger workers. In this case, the resource separation of workers and schedulers is good approach. So, I think that your original option would also be good for us at least because the resource of MultiThreadAgent and ScheduleExecutor could be split.

But, I can add another option like digdag server --ignore-schedules. How about this, then?

The ignore option (#1071 as well) will mitigate your issue but, hm.. I'm not pretty sure that the option will be useful for some users. Rather will it make users confusing, it won't?

What do you think that we will add new add-schedule option, enables associating a schedule to existing and stored workflows (still designing)? The essential issue may be that a schedule setting is included in the workflow yaml file. We sometimes want to update schedule setting only for the workflow but, to do that we need to edit the workflow file and push a whole of the project. That's annoyed for us. If this option can fit with your case, we will add it on the roadmap that we're now making.

sonots commented 5 years ago

Thank you. I understood that —disable-scheduler is still useful.

Rather will it make users confusing, it won't?

Also, I agree that having both —ignore-schedules and —disable-scheduler make users confusing. It was one of my concerns.

What do you think that we will add new add-schedule option, enables associating a schedule to existing and stored workflows (still designing)?

We like writing schedules on workflow dig files because we can review code changes of schedules on GitHub. Therefore, “digdag add-schedule” is not a perfect solution for my issue.

However, if we can seperate workflow dig files into task files and something like a schedule file which describes schedules of workflows, and “digdag add-schedule” accepts the schedule file, both of our requirements that 1. code review is possible 2. no schedule on staging could be resolved.

muga commented 5 years ago

Thank you for sharing your situation more and proposing separated schedule files approach. Agree that's good for code reviewing.

I'd like to share to you about another idea "draft revision", in that workflows will not be executed by a scheduler. If users want to run drafted workflows on the server, they push the revision with --draft option. the server receives the project archive, stores new revision record with draft flag to the revisions table, and doesn't store new schedule record as well. Once users think that the workflows are ready to deploy, the server drops the flag on the revision (via REST API). The idea comes from internal communication in the company. What do you think? @sonots @yoyama

sonots commented 5 years ago

Is it possible to run a workflow on digdag webui if a workflow is pushed with "--draft" option?

If it is possible, the option fits to my purpose, building a staging environment. I want to run workflows manually on their.

muga commented 5 years ago

Yes, it's possible rather we can change. We've been making Digdag roadmap for the future. I'll talk to @yoyama and consider that the idea will be included in the roadmap.

sonots commented 5 years ago

Sounds good. I will close --ignore-schedules PR.

muga commented 5 years ago

Got it. I rename the ticket title with "draft revision". As talked with @yoyama and @szyn in-person, we will publish the roadmap on the repository as much as possible.