Python-action: detect which dependencies changed and which targets are missing

patrikturi commented 5 years ago

Essentially need to get the same information in a python-action as provided by doit info on the console.

I need this to process only those dependencies of the task which have changed.

What I've tried:

def my_action(task):
    print(task.dep_changed)

However this will print all the dependencies, no matter what. I guess it is related to this comment in the code:

        # dep_changed is set on get_status()
        # Some commands (like `clean` also uses expand_args but do not
        # uses get_status, so `changed` is not available.

And need the missing targets as well, those seem to be separate from dep_changed in doit info.

schettino72 commented 5 years ago

However this will print all the dependencies, no matter what.

Sounds like a bug. Can you show me a minimal example to reproduce the problem?

And need the missing targets as well

That is not supported, it will report all targets. you can easily check witch targets are missing with python code. Not sure it makes sense for doit to report those... if you really want that we can discuss, please create a separate issue.

patrikturi commented 5 years ago

Sounds like a bug. Can you show me a minimal example to reproduce the problem?

Turns out to be a bug indeed. dep_changed is correct if one of the dependencies changed, but it is incorrect if one of the targets have been removed:

import os
import shutil

def task_copy():
    return {
        'file_dep': ['a.txt', 'b.txt'],
        'targets': ['out/a.txt', 'out/b.txt'],
        'actions': [copy_action],
    }

def copy_action(task, dependencies):
    print(task.dep_changed)
    if not os.path.exists('out'):
        os.mkdir('out')

    for dep in dependencies:
        shutil.copyfile(dep, 'out/{}'.format(dep))

Create a.txt and b.txt
Run doit
Delete out/a.txt
Run doit -v 2
Prints ['a.txt', 'b.txt'] (incorrect)
Expected ['a.txt'] because in fact b.txt is still up to date

You could also add an example of task.dep_changed to the documentation, because I had to figure it out from the source.

you can easily check witch targets are missing with python code

Ok I accept this solution.

schettino72 commented 5 years ago

Actually the right way to use this feature is with a parameter named changed

def my_action(changed):
    print(changed)

The current behaviour is expected. When a task is not up-to-date for any reason other than dependencies changed, it will report as if all file_dep were modified.

This was a questionable decision... do you have a specific use-case, I might reconsider this behaviour...

Also this behaviour is described on source code but not on user's docs. So definitely some action is required. https://github.com/pydoit/doit/blob/master/doit/dependency.py#L571

patrikturi commented 5 years ago

Actually the right way to use this feature is with a parameter named changed

Thank you, from the docs I had no idea you can inject changed

do you have a specific use-case, I might reconsider this behaviour...

My use case: 1) In the example above, suppose that copy_action takes a long time for each file 2) Because of 1) I want to copy only the files that are a) not up to date or b) their target doesn't exist 3) Iterating on changed sounds like a great solution, but sometimes this behavior will make the task do more work than it should

Proposed solution: if changed reported empty list in this case, I can iterate all the targets and check if any of them are missing, then add their dependency to my list of changed dependencies.

schettino72 commented 5 years ago

from the docs I had no idea you can inject changed

It is there: http://pydoit.org/tasks.html#keywords-with-task-metadata

My use case:

For your use case it is better to have a separate task for each file being copied, since they are independent...

There is a cost on creating the list of changed dependencies, and this is seldom used in practice. This is an old feature, today i would implement this feature in a different way... anyway for now I am more inclined to set changed to None, meaning it was not calculated.

patrikturi commented 5 years ago

For your use case it is better to have a separate task for each file being copied, since they are independent...

The number of files and their names are not known in advance in the use case. As far as I know doit needs a method for each task. In your suggestion one would need to generate methods (one task for one file), how would you do that?

I am more inclined to set changed to None, meaning it was not calculated

I understand there is no way to know which target depends on which dependency, because the description is general.
I suggest you could introduce dependency/target description like "folder/dep.txt" and "folder/target.txt" meaning a 1:1 target:dependency relationship. GNU make has a similar feature, eg.:

out/%.png: input/%.png

With these restrictions it would be possible to calculate the changed variable, which is essential for supporting incremental build properly. What do you think?

patrikturi commented 5 years ago

@schettino72 if you agree with the previous comment, I would create a new ticket for the feature mentioned in it and we can consider this discussion resolved.

schettino72 commented 5 years ago

Sorry, I do not agree with Make style regular expressions to define dependencies and targets. If you do not know the name of files in advance you could use delayed-tasks.

Here is an example:

http://pydoit.org/task_creation.html#delayed-task-creation

patrikturi commented 5 years ago

I do not agree with Make style regular expressions to define dependencies and targets

Why? It would be a lot easier to use than the delayed task creation you linked.

smarie commented 5 years ago

@patrikturi concerning your last comment about generic expressions, I think that have the same use case and I solved it easily by creating one subtask per file dynamically.

You will see how I did it in issue #327. I also propose in that issue a utility method to ease the process even more. Would something like that satisfy both the user (@patrikturi ) and the maintainer (@schettino72) ? Let me know what you think!

smarie commented 5 years ago

Also concerning the other topic in this ticket: I had the issue today to try to debug "why did a task run again ??? It should not".

Being able to print to stdout the reason why tasks are run (dependency change or file dep change or target change) using a simple commandline flag (for example a maven-style flag doit -X or a pytest-style doit -v) would really be a killer feature - if it is not already there (in which case it should probably be highlighted in the doc in a better way). Should we open a ticket for this in particular ?

EDIT: for what's worth, my current workaround for this is to use the following why_am_i_running python action explicitly at the beginning of all my task actions lists:

from os.path import exists

def why_am_i_running(task, changed):
    for t in task.targets:
        if not exists(t):
            print("Running task '%s' because one of its targets does not exist anymore: '%s'" % (task, t))
            return

    print("Running task '%s' because the following changed: '%s'" % (task, changed))

EDIT2: There might be a case missing in this draft: the case where result_dep is the reason for re-running.

patrikturi commented 5 years ago

@smarie Thank you! This helper is exactly what I needed.

smarie commented 5 years ago

You're welcome @patrikturi ! Thanks for the feedback

pydoit / doit

Python-action: detect which dependencies changed and which targets are missing #277