natcap / taskgraph

Other
21 stars 7 forks source link

Support passing lambdas into tasks #99

Open emlys opened 11 months ago

emlys commented 11 months ago

Callables passed into TaskGraph.add_task currently have to be named functions defined in the global scope. There are many cases where it would be convenient to pass in a lambda function defined in-line. I'm specifically thinking about raster_map, e.g.

taskgraph.add_task(
    func=pygeoprocessing.raster_map,
    kwargs=dict(
        op=<...>,
        rasters=rasters,
        target_path=target_path),
    ...

It would be convenient to write op=lambda x: ..., but that breaks when n_workers > 0 because the args must be pickled. And pickle cannot pickle lambdas or local objects.

Taskgraph could support lambdas and local callables by using multiprocess, a fork of python's multiprocessing that supports pickling more types. I briefly tried replacing multiprocessing with multiprocess in Task.py and it worked (test suite passed and was able to pass lambdas into taskgraph). But there could be other implications of using multiprocess, like conflicts with multiprocessing if both were used.

It's also worth noting that, while python supports pickling functools.partials, taskgraph raises an error because it relies on the __name__ attribute. It would be nice to support partials too.

Traceback (most recent call last):
  File "/Users/emily/miniconda3/envs/main/lib/python3.10/site-packages/taskgraph/Task.py", line 625, in add_task
    new_task = Task(
  File "/Users/emily/miniconda3/envs/main/lib/python3.10/site-packages/taskgraph/Task.py", line 1003, in __init__
    scrubbed_value = _scrub_task_args(arg, self._target_path_list)
  File "/Users/emily/miniconda3/envs/main/lib/python3.10/site-packages/taskgraph/Task.py", line 1459, in _scrub_task_args
    return '%s:%s' % (base_value.__name__, source_code)
AttributeError: 'functools.partial' object has no attribute '__name__'. Did you mean: '__ne__'?
dcdenu4 commented 11 months ago

Related to #83

emlys commented 11 months ago

We talked about this today and decided to wait until after the taskgraph 1.0 release. Taskgraph has been stable for a while now, and we want to do the release before mixing anything up.

emlys commented 11 months ago

There might be more minimal ways to add support for lambdas, like patching multiprocessing without using multiprocess, or passing in lambdas as strings.