mara / mara-pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
MIT License
2.07k stars 100 forks source link

Fix for RuntimeError on second local pipeline run on py3.7 #23

Closed jankatins closed 4 years ago

jankatins commented 4 years ago

On my mac on py3.7 I get RuntimeError('context has already been set') when multiprocessing.set_start_method('fork') is run during the the second pipeline run (the first is fine).

Debugging middleware caught exception in streamed response at a point where response headers were already sent.
Traceback (most recent call last):
  File ".venv/lib/python3.7/site-packages/werkzeug/wsgi.py", line 507, in __next__
    return self._next()
  File ".venv/lib/python3.7/site-packages/werkzeug/wrappers/base_response.py", line 45, in _iter_encoded
    for item in iterable:
  File "packages/data-integration/data_integration/ui/run_page.py", line 108, in process_events
    for event in execution.run_pipeline(pipeline, nodes, with_upstreams):
  File "packages/data-integration/data_integration/execution.py", line 45, in run_pipeline
    multiprocessing.set_start_method('fork')
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/context.py", line 242, in set_start_method
    raise RuntimeError('context has already been set')

The fix is to only set the start methog if this is not set to that value.

jankatins commented 4 years ago

I only tested this on mac with python 3.7, this should probably both tested on mac with 3.8 and on linux

jankatins commented 4 years ago

will do