luispedro / jug

Parallel programming with Python
https://jug.readthedocs.io
MIT License
412 stars 62 forks source link

jug shell's invalidate() fails if NoLoad task in DAG #77

Closed justinrporter closed 5 years ago

justinrporter commented 5 years ago

Consider the following jugfile:

import random

import jug
from jug.io import NoLoad

@jug.TaskGenerator
def gauss(i):
    return random.gauss(0, 1)

@jug.TaskGenerator
def load_and_square(t):
    data = jug.value(t.t)
    return data**2

@jug.TaskGenerator
def sum_squares(nums):
    return sum(nums)

ts = [load_and_square(NoLoad(gauss(i))) for i in range(3)]
sum_squares(ts)

The command jug invalidate --target jugfile.gauss (as well as load_and_square and num_squares) all behave as expected. However, if you drop into a jug shell (for example because only a subset of the tasks should be invalidated) and use the "invalidate()" builtin, you encounter unexpected behavior:

$ jug shell
In [1]: get_tasks()
Out[1]:
[Task(jugfile.gauss, args=(0,), kwargs={}),
 Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214168>,), kwargs={}),
 Task(jugfile.gauss, args=(1,), kwargs={}),
 Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e2141f8>,), kwargs={}),
 Task(jugfile.gauss, args=(2,), kwargs={}),
 Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214288>,), kwargs={}),
 Task(jugfile.sum_squares, args=([Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214168>,), kwargs={}), Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e2141f8>,), kwargs={}), Task(jugfile.load_and_square, args=(<jug.io.NoLoad object at 0x10e214288>,), kwargs={})],), kwargs={})]

In [2]: invalidate(get_tasks()[0])
Building task DAG... (only performed once)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/tmp/jugfile.py in <module>
----> 1 invalidate(get_tasks()[0])

~/.envs/jug-debug/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/shell.py in _invalidate(t)
    135             '''
    136             from ..task import alltasks
--> 137             return invalidate(alltasks, reverse_cache, t)
    138
    139         def _get_tasks():

~/.envs/jug-debug/lib/python3.6/site-packages/Jug-1.6.7+git-py3.6.egg/jug/subcommands/shell.py in invalidate(tasklist, reverse, task)
     67         for t in tasklist:
     68             for d in t.dependencies():
---> 69                 reverse.setdefault(d.hash(), []).append(t)
     70     queue = [task]
     71     seen = set()

AttributeError: 'NoLoad' object has no attribute 'hash'

Since this error occurs with any task, it seems like maybe the NoLoad "poisons" the whole DAG. It does not happen with Task.invalidate().