pyinvoke / invoke

Pythonic task management & command execution.
http://pyinvoke.org
BSD 2-Clause "Simplified" License
4.36k stars 367 forks source link

Identical tasks from different modules are not kept distinct #536

Open dgrine opened 6 years ago

dgrine commented 6 years ago

Consider the following sample project structure:

├── __init__.py
├── one
│   ├── __init__.py
│   └── tasks.py
├── tasks.py
└── two
    ├── __init__.py
    └── tasks.py

Content of one/tasks.py:

from invoke import task
name = "one"
@task
def clean(context):
    print("clean {}".format(name))

Content of two/tasks.py:

from invoke import task
name = "two"
@task
def clean(context):
    print("clean {}".format(name))

The top-level ./tasks.py contains:

from invoke import task, Collection
if __package__ is None or __package__ == '':
    from one import tasks as one
    from two import tasks as two
else:
    from .one import tasks as one
    from .two import tasks as two

def full_name(obj):
  return obj.__module__ + "." + obj.__class__.__name__

# Works as expected
def wrap(task_list):
    for task in task_list: task.__name__ = full_name(task)
    return task_list

# Does not work: only one of the two subtasks is executed
def nowrap(task_list):
    return task_list

ns = Collection()

@task(pre = wrap([one.clean, two.clean]))
def clean(context):
    print("clean")

When using the ‘wrap’ function, everything works as expected: the two pre tasks are executed. When passing the list unaltered, only one pre task runs. This does not happen if the content of the two tasks is different. It seems a hash of the task is made, and that it uniquely identifies the task. I'm new to invoke, but if I'm right on track, then the hash function should probably make use of the full name of the object. Of course, I could just be using the framework incorrectly, in which case, please feel free to correct this.

bitprophet commented 6 years ago

Thanks for the detailed report!

Took me a bit to understand what was going on, but I think I get it now. You'll note that it's not the contents of the tasks themselves, simply their identities/names, that you needed to munge to work around the issue.

I think this is simply task deduplication at work: http://docs.pyinvoke.org/en/1.0/concepts/invoking-tasks.html#task-deduplication

The way deduplication works under the hood can be seen here:


So, next steps?

dgrine commented 6 years ago

Thanks for the extensive reply!

My understanding is that task deduplication prevents the same task from running more than once during a session. As you said, it all boils down to the definition of task equality.

I would argue that tasks located in different files should never be considered equal, regardless of their name. In the original example, not only are the names the same, they even share the same body. However, they refer to different data, making them - imo - different tasks: the results of the two tasks cannot be assumed to be equal.

As for our use case, you guessed correctly: we indeed have multiple subpackages located in some folder where each subpackage is assumed to implement a set of tasks (this forms the interface, in a sense). These subpackages are not known beforehand: they are put there by some external mechanism. The top-level tasks.py file is in charge of:

Put simply: it provides an abstraction that forwards to the actual subpackage implementations.

As for the current task equality check, I thought it was based only on the __name__ attribute. However, I've played a bit more with it and was surprised that it is not just the name that is taken into account. Consider a slight variation on the original example:

├── __init__.py
├── one
│   ├── __init__.py
│   └── tasks.py
├── tasks.py
└── two
    ├── __init__.py
    └── tasks.py

If one/tasks.py and two/tasks.py have the same content:

from invoke import task
@task
def clean(context):
    print("clean sub")

Running the top-level clean task with nowrap (see original example) results in:

$ inv clean
clean sub
clean

Running it with wrap results in:

$ inv clean
clean sub
clean sub
clean

This is in accordance with the __name__ attribute being used to differentiate task equality. So far, so good.

However, when changing the body of two/tasks.py to:

from invoke import task
@task
def clean(context):
    print("something else")
    print("clean sub")

Surprisingly, this results in:

clean sub
something else
clean sub
clean

In other words: the tasks are considered to be different, even though they have the same __name__ attribute!

I've created a small example repository that reproduces this: https://github.com/dgrine/invoke-536

Finally, regarding work-arounds: this is not a blocking issue, I can cope with using the wrap/nowrap decorators. Everything else is working smoothly :-)

bitprophet commented 6 years ago

That last bit is interesting! I suspect it has to do with function equality and code caching or something along those lines. Will have to poke at it sometime. Thanks for the input as well, I do think that "same name? fall back to location check" probably is reasonable so we may go that route.

asford commented 6 years ago

I think this is issue is caused by your use of six.get_function_code when comparing functions for equality, which is not robust to to the contents of the function closure. In the case @dgrine describes above, the two functions one/tasks.py:clean and two/tasks.py:clean have the same function body, but have differing lexical closures over the variable name.

This can be minimally reproduced, and is a bit of pain, when you use a factory function for tasks. The solution you proposed, checking for the file location, won't work in this context. A (vaguely contrived) example:

from invoke import task, Collection

def make_task(echo_val):
    @task
    def echo(c):
        c.run("echo %s" % echo_val)
    return echo

ns = Collection()

ns.add_task(make_task("foo"))
ns.add_task(make_task("bar"))

I've a more fleshed out example, that's currently bitten by this bug, at https://github.com/asford/toxish-conda.

@bitprophet I'm curious, why not just use the standard function equality operator instead of comparing the function body? This appears to work properly for standard functions as well as class methods.

hokreb commented 5 years ago

I stumbled today also over this behavior, until I found out per accident (different text in a print statement), that is duplication of tasks is found by comparing method. If the same method is exactly in an other module and gets imported, it is not executed. I think, as this behavior is not intuitive it would be at least helpful, if it is mentioned in the documentation.

dgrine commented 4 years ago

Is there any progress on this? It would really be helpful if tasks located in different files are not considered the same. As suggested by @asford, is using the standard function equality operator a solution?