Identical tasks from different modules are not kept distinct

dgrine commented 6 years ago

Consider the following sample project structure:

├── __init__.py
├── one
│   ├── __init__.py
│   └── tasks.py
├── tasks.py
└── two
    ├── __init__.py
    └── tasks.py

Content of one/tasks.py:

from invoke import task
name = "one"
@task
def clean(context):
    print("clean {}".format(name))

Content of two/tasks.py:

from invoke import task
name = "two"
@task
def clean(context):
    print("clean {}".format(name))

The top-level ./tasks.py contains:

from invoke import task, Collection
if __package__ is None or __package__ == '':
    from one import tasks as one
    from two import tasks as two
else:
    from .one import tasks as one
    from .two import tasks as two

def full_name(obj):
  return obj.__module__ + "." + obj.__class__.__name__

# Works as expected
def wrap(task_list):
    for task in task_list: task.__name__ = full_name(task)
    return task_list

# Does not work: only one of the two subtasks is executed
def nowrap(task_list):
    return task_list

ns = Collection()

@task(pre = wrap([one.clean, two.clean]))
def clean(context):
    print("clean")

When using the ‘wrap’ function, everything works as expected: the two pre tasks are executed. When passing the list unaltered, only one pre task runs. This does not happen if the content of the two tasks is different. It seems a hash of the task is made, and that it uniquely identifies the task. I'm new to invoke, but if I'm right on track, then the hash function should probably make use of the full name of the object. Of course, I could just be using the framework incorrectly, in which case, please feel free to correct this.

bitprophet commented 6 years ago

Thanks for the detailed report!

Took me a bit to understand what was going on, but I think I get it now. You'll note that it's not the contents of the tasks themselves, simply their identities/names, that you needed to munge to work around the issue.

I think this is simply task deduplication at work: http://docs.pyinvoke.org/en/1.0/concepts/invoking-tasks.html#task-deduplication

The way deduplication works under the hood can be seen here:

Deduplication itself, using if call in <list>
Call equality, since if x in y uses equality between x and members of y; and at this point the objects are Calls, representing a specific invocations of a Task
Task equality, since the above boils down to "compare equality of the task objects".
Task name, which brings us to why your manipulation of __name__ worked correctly. (I don't recall if this is something we came up with or if it's common Python object notation, honestly. Been a while. But you found it either way.)

So, next steps?

First, I'm curious what your non-simplified use case was, as knowing that may expose more intent than I can guess at here.
- I assume you really do have multiple subpackages each with a clean action, and some task or tasks want to run more than one of those cleans as pre-tasks?
- Is this like, clean up my docs and my build artifacts and my test output and etc etc, before deploying? Other?
- If so, is there a particular reason you're not adding an intermediate "clean all the things" meta-task which calls the other clean()s internally, and then referencing that in pre=? Which would be another workaround.
I believe the "correct" solution right now is for you to consider disabling the deduplication (see the docs link up top for how, it's just a config option).
- The only downside of doing so is that accidental user duplication on the CLI is harder to detect. Shrug?
We may instead/in addition want to enhance the definition of Task.__eq__ to look at things like __module__ or __file__ or whatnot.
- On one hand, this sounds nice and clean and correct.
- On the other hand, one of the primary goals of the task/collection namespacing setup is to allow users to do all sorts of complex things that I can't even imagine, so the more rigidity we enforce, the more we risk getting in advanced users' ways.
- That said, in this particular case, I really can't see anybody intending to run the actual exact same task body multiple times while still having deduplication turned on, so it seems like a low risk. But it is nonzero :)

dgrine commented 6 years ago

Thanks for the extensive reply!

My understanding is that task deduplication prevents the same task from running more than once during a session. As you said, it all boils down to the definition of task equality.

I would argue that tasks located in different files should never be considered equal, regardless of their name. In the original example, not only are the names the same, they even share the same body. However, they refer to different data, making them - imo - different tasks: the results of the two tasks cannot be assumed to be equal.

As for our use case, you guessed correctly: we indeed have multiple subpackages located in some folder where each subpackage is assumed to implement a set of tasks (this forms the interface, in a sense). These subpackages are not known beforehand: they are put there by some external mechanism. The top-level tasks.py file is in charge of:

collecting the subpackage tasks in their own namespace, and
creating new tasks (same interface) where each task has its pre= given all the relevant tasks discovered

Put simply: it provides an abstraction that forwards to the actual subpackage implementations.

As for the current task equality check, I thought it was based only on the __name__ attribute. However, I've played a bit more with it and was surprised that it is not just the name that is taken into account. Consider a slight variation on the original example:

├── __init__.py
├── one
│   ├── __init__.py
│   └── tasks.py
├── tasks.py
└── two
    ├── __init__.py
    └── tasks.py

If one/tasks.py and two/tasks.py have the same content:

from invoke import task
@task
def clean(context):
    print("clean sub")

Running the top-level clean task with nowrap (see original example) results in:

$ inv clean
clean sub
clean

Running it with wrap results in:

$ inv clean
clean sub
clean sub
clean

This is in accordance with the __name__ attribute being used to differentiate task equality. So far, so good.

However, when changing the body of two/tasks.py to:

from invoke import task
@task
def clean(context):
    print("something else")
    print("clean sub")

Surprisingly, this results in:

clean sub
something else
clean sub
clean

In other words: the tasks are considered to be different, even though they have the same __name__ attribute!

I've created a small example repository that reproduces this: https://github.com/dgrine/invoke-536

Finally, regarding work-arounds: this is not a blocking issue, I can cope with using the wrap/nowrap decorators. Everything else is working smoothly :-)

bitprophet commented 6 years ago

That last bit is interesting! I suspect it has to do with function equality and code caching or something along those lines. Will have to poke at it sometime. Thanks for the input as well, I do think that "same name? fall back to location check" probably is reasonable so we may go that route.

asford commented 6 years ago

I think this is issue is caused by your use of six.get_function_code when comparing functions for equality, which is not robust to to the contents of the function closure. In the case @dgrine describes above, the two functions one/tasks.py:clean and two/tasks.py:clean have the same function body, but have differing lexical closures over the variable name.

This can be minimally reproduced, and is a bit of pain, when you use a factory function for tasks. The solution you proposed, checking for the file location, won't work in this context. A (vaguely contrived) example:

from invoke import task, Collection

def make_task(echo_val):
    @task
    def echo(c):
        c.run("echo %s" % echo_val)
    return echo

ns = Collection()

ns.add_task(make_task("foo"))
ns.add_task(make_task("bar"))

I've a more fleshed out example, that's currently bitten by this bug, at https://github.com/asford/toxish-conda.

@bitprophet I'm curious, why not just use the standard function equality operator instead of comparing the function body? This appears to work properly for standard functions as well as class methods.

hokreb commented 5 years ago

I stumbled today also over this behavior, until I found out per accident (different text in a print statement), that is duplication of tasks is found by comparing method. If the same method is exactly in an other module and gets imported, it is not executed. I think, as this behavior is not intuitive it would be at least helpful, if it is mentioned in the documentation.

dgrine commented 4 years ago

Is there any progress on this? It would really be helpful if tasks located in different files are not considered the same. As suggested by @asford, is using the standard function equality operator a solution?

pyinvoke / invoke

Identical tasks from different modules are not kept distinct #536