call dry_run method of non finished tasks during dry-run

anselm-baur commented 3 years ago

I've implemented a small feature when the dry-run run mode was chosen, which then calls the dry_run() method of not finished tasks. This allows an easy and fast way to implement e.g. file output debugging. One just put some print statement with path information in a dry_run method and execute in dry-run mode and has the results one is looking for.

Example:

adding method to a child of a Task object

class SomeTask(b2luigi.Task):
[...]
    def dry_run(self):
        # get list of path objects from any other method
        paths = self.get_output_paths()
        for p in paths:
            print(str(p))

output of dry-run mode without a dry_run class:

SomeTask
        Would run SomeTask[...]

output of dry-run mode with a dry_run method:

SomeTask
        Would run SomeTask[...]
        ----------- call: dry_run() -------------
/path/to/output.file
/path/to/another_output.file

There is a point about backward compatibility, I am not sure how to handle it. What happens with old code having a dry_run method implemented and which should not be executed in dry-run mode. So we should provide some mechanism to suppress the call. But I'm not sure where to place this. Any suggestions?

meliache commented 3 years ago

Thanks for the PR, I see how this could be useful, I will give this some thought when I have time, hope others can comment as well.

nils-braun commented 3 years ago

Also thanks from my side. I think I did not get you backwards compatibility issue.

anselm-baur commented 3 years ago

Also thanks from my side. I think I did not get you backwards compatibility issue.

Sorry for not being very clear. I thought about what happens with already written code using b2luigi tasks and having already implemented a method named dry_run and if one does not like to call this method always in the dry_run mode which would happen by default if my feature is merged. This is why I'm thinking about a suppress mechanism.

One solution could be using two different args: --dry-run: calls the dry_run method --dry-run-s: suppress the call (this is the same what the old --dry-run arg is doing)

meliache commented 3 years ago

I would say the backwards compatibility issue is minor. b2luigi is still not fully released yet, i.e. has 0. version, so we can allow some backwards compatibility breakage between releases. I doubt any of our few users have a dry_run() method for their tasks and if they have it, it's unlikely that accidentally running method called dry_run would do any harm. Maybe it could raise errors if it expects the outputs of the task's requirements to exist.

I'd vote against the additional command line parameter as I would prefer to keep the CLI user interface as simple as possible. If we want to do something about it, we could also just use a special b2luigi setting for the suppression.

I'm still thinking whether something similar can be achieved easily with what we already have. For just a single task like your example, one could just change the run() method for debugging purposes and just do a full run. That wouldn't work for multiple tasks in a task tree at once, because dependencies wouldn't be fullfilled.

Well, your dry_run() method can't depend on the requirement output to exist, either, so one could also easily write debugging code like this (adapted from your example):

for task in (task1, task2, ...):
    print("\n".join(task.get_output_paths()))

But well, I still like your idea as code-wise it doesn't introduce much complexity and their is not much harm in supporting that, even if there's other ways to achieve something similar, except your potential backwards compatibility issue. Just as a new admin here I try to be a bit strict ;)

nils-braun commented 3 years ago

I would say the backwards compatibility issue is minor.

Same comment from my side. I highly doubt there are users with a dry_run method and if this is the case, they probably want to achieve exactly that. I would also not add a command line argument for this - renaming functions in python can be done with any reasonable IDE today ;-)

Concerning the more general question of whether to support such functionality at all: I think it might make sense to have a move custom dry-run functionality, especially if one writes "library-like" b2luigi functions (e.g. classes, which are then derived again for solving the real task). But, Michael is the new author ;-)

Just two comments to the code:

it would be nice to document the functionality somewhere
I am personally not a huge fan of these "-------" lines. Maybe you can just live without it (it is all human-readable output after all)

meliache commented 3 years ago

I agree with Nils' code comments above. I assumed the print("call: ...") was just a line added for testing/debugging but didn't comment on that yet because this is marked as a WIP/draft PR.

I don't see an obvious way for the generic dry-run functionality, i.e. I'd have to think about it . I'd be fine with merging this as is (after implementing Nils' comments) and have an issue or separate PR for the generic functionality.

anselm-baur commented 3 years ago

Thank you for the comments. Yes indeed, I will add this dry_run call feature to the documentation and remove the dashes in the output, and create a real PR and also an issue why this nice feature is missing :)

meliache commented 3 years ago

Your latest changes look good to me, I did a test-build of the updated docs and it looks good. Thanks for contributing :smiley: .

Since this is not urgent I am waiting just in case @nils-braun has something to say. Also I think it would be useful if he could explain in a bit more detail what he means by the general version of the dry-run feature, since I am not sure I get what he exactly has in mind with that. But thanks Anselm for creating #71 for that.

Small tip: In the future don't use commit messages like "adopted comments", instead the commit messages should say what was changed in the subject (first line) and the why should be addressed in the body of the commit message, that is everything after an empty line, see e.g. Chris Beams: How to write a git commit message.

anselm-baur commented 3 years ago

Sorry for that trivial commit message. I just wanted to wrap the things up fast, so the small issue can be closed and would not consume too much precious time of the maintainer. After all, everything is already very well documented in this PR. ^^

nils-braun commented 3 years ago

Also fine with the changes! Concerning the "general version of the dry-run feature": I imagine (but I also might be wrong), that b2luigi is used at Belle (II) often in a "library" mode: some people (like you two) write tasks and functionality using b2luigi - others use it (and just run it or start building on top if it). That means, very often you need to use and execute code, which is not your own. Of course, you can read the code - but we all know that that will not happen in all cases ;-) Therefore I think it makes sense to provide such functionality, because for your "workaround", @meliache , you would need to know which parameters of the task classes to check (and print) and what they mean (which you do not need to know if you just call a dry-run function where the original author has decided on what is important to print)

nils-braun / b2luigi

call dry_run method of non finished tasks during dry-run #70