spotify / luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.79k stars 2.39k forks source link

Is luigi expected to honor task priorities for multiple independent workflow executions? #2153

Closed asolkov closed 7 years ago

asolkov commented 7 years ago

I built a luigi workflow which is triggered by a user at any time via rundeck job. The user may execute this workflow with a different parameters concurrently. Some tasks in the workflow should be executed only sequentially so requires() + resources are fine here as long as only one single workflow is executed. I ended up having a single central luigi scheduler and tasks with priorities linked to a workflow start time.

Unfortunately, I'm not able to achieve the goal if the workflow is triggered several times. Despite the lower priorities the task are still executed from the second run of luigi binary. Should I expect that luigi keep track of priorities not only within same workflow execution, but also among executions?

Thanks for any hints.

Tarrasch commented 7 years ago

Your observation is correct as in priorities are not honored across workflows. One work around could be to only let assistants do work, and let other executions only schedule tasks (--workers 0 option).

asolkov commented 7 years ago

@Tarrasch thanks a lot for feedback.

What is the correct way to create assistant workers and let them talk to a particular scheduler? Will assistant workers stay alive even after they are finished with current job? As I understood, I should instantiate them independently from the workflow and then trigger workflows as usual but adding --workers=0 option.

asolkov commented 7 years ago

@Tarrasch , could you please give any hints how I can create assistant workers?

Tarrasch commented 7 years ago

Hi, I suggest looking at these two email threads:

Then you probably can start playing around with it. I would consider it as a pretty stable feature now (there's many tests and I've used it in production), although as you can see it's not yet properly documented. Good luck.

dlstadther commented 7 years ago

closing this issue.

Every open issue adds some clutter, and we try to make the issues fewer and make it easier for new collaborators to find. Currently we try to close any issue that meets the first checkbox + one other.

Feel free to reopen this issue at any point if you have the intent to continue to work this. :)