m3dev / gokart

Gokart solves reproducibility, task dependencies, constraints of good code, and ease of use for Machine Learning Pipeline.
https://gokart.readthedocs.io/en/latest/
MIT License
305 stars 57 forks source link

How to run tasks in parallel #251

Closed jHaselberger closed 2 years ago

jHaselberger commented 2 years ago

Let's assume the following example:

import gokart
import luigi
import time

class DemoTask(gokart.TaskOnKart):
  name = luigi.Parameter()

  def run(self,):
    print(f"demo running for {self.name}")
    time.sleep(1)
    self.dump(f"Demo -> {self.name}")

class Prepare(gokart.TaskOnKart):
  names = list(range(0,20,1))

  def requires(self):
    return {f"task_{n}": DemoTask(name=str(n)) for n in self.names}

  def run(self):
    self.dump('DONE')

prep = Prepare()
output = gokart.build(task=prep)
print(output)

All DemoTasks are executed randomly but one after another. Is there a way to execute them in parallel?

Hi-king commented 2 years ago

@gismo07 Thx for the good point of discussion.

[Possible Solution]

You can write like

luigi.build([prep], local_scheduler=True, detailed_summary=True, workers=20)
output = prep.output().load()

instead of

output = gokart.build(task=prep)

[Background] This happens because gokart.build hiding the argument workers of luigi.build https://github.com/m3dev/gokart/blob/06ede5cefaf002faf273a79b761309aa27707811/gokart/build.py#L55 Actually, this might be lack of a feature . Yes, PRs fixing this are very welcome :)

jHaselberger commented 2 years ago

Thank you, it's working like expected