spotify / luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Apache License 2.0
17.71k stars 2.39k forks source link

Command line parameters not being passed to workers in macOS #3236

Closed bioinsilico closed 1 year ago

bioinsilico commented 1 year ago

Hi, luigi community!

I am experiencing some problems when trying to pass global parameters to multiple workers. When I run a workflow using multiple workers, the command line parameter values are not being passed. I am wondering if this might be similar to the problem observed when luigi is run on Windows (https://github.com/spotify/luigi/issues/2247) but, in my case, on macOS. An additional issue is a difference in log formatting, it seems that logging.cfg is not passed to the workers.

Next toy example summarizes my problem.


import luigi
import logging

logger = logging.getLogger('luigi-interface')

class HelloConfig(luigi.Config):
    reference = luigi.Parameter(default="World")

class HelloTask(luigi.Task):
    def run(self):
        logger.info("Hello %s!", HelloConfig().reference)

    def requires(self):
        return []

I run the task calling

luigi --module weekly_update.etl.load_ex HelloTask \
      --workers ? \
      --HelloConfig-reference "Mars"

When workers is 1 the log shows

2023-04-21 10:59:09,386 - luigi-interface - INFO - [MainThread] - Hello Mars!

but when workers is 2

2023-04-21 10:59:53,030 [INFO]-load_ex.run: Hello World!
hypostulate commented 1 year ago

As of Python 3.8, MacOS now defaults to using spawn instead of fork, thus having issues that we previously only saw on Windows. You can change the start method using

import multiprocessing

multiprocessing.set_start_method('fork')

I'm not sure there is a better solution, and we're also struggling with command line parameters across multiple workers when not on Linux.

bioinsilico commented 1 year ago

Thanks so much! @hypostulate. Your solution worked like a charm. I had lost all my hope on this issue. In case you had the time to reply, I asked the same question in StackOverflow (https://stackoverflow.com/questions/76081135/luigi-workflow-engine-command-line-parameters-not-being-passed-to-workers-in-mac).