Closed anselm-baur closed 3 years ago
Thanks for the PR, I see how this could be useful, I will give this some thought when I have time, hope others can comment as well.
Also thanks from my side. I think I did not get you backwards compatibility issue.
Also thanks from my side. I think I did not get you backwards compatibility issue.
Sorry for not being very clear. I thought about what happens with already written code using b2luigi tasks and having already implemented a method named dry_run
and if one does not like to call this method always in the dry_run
mode which would happen by default if my feature is merged. This is why I'm thinking about a suppress mechanism.
One solution could be using two different args:
--dry-run
: calls the dry_run
method
--dry-run-s
: suppress the call (this is the same what the old --dry-run
arg is doing)
I would say the backwards compatibility issue is minor. b2luigi is still not fully released yet, i.e. has 0.
version, so we can allow some backwards compatibility breakage between releases. I doubt any of our few users have a dry_run()
method for their tasks and if they have it, it's unlikely that accidentally running method called dry_run
would do any harm. Maybe it could raise errors if it expects the outputs of the task's requirements to exist.
I'd vote against the additional command line parameter as I would prefer to keep the CLI user interface as simple as possible. If we want to do something about it, we could also just use a special b2luigi setting for the suppression.
I'm still thinking whether something similar can be achieved easily with what we already have. For just a single task like your example, one could just change the run()
method for debugging purposes and just do a full run. That wouldn't work for multiple tasks in a task tree at once, because dependencies wouldn't be fullfilled.
Well, your dry_run()
method can't depend on the requirement output to exist, either, so one could also easily write debugging code like this (adapted from your example):
for task in (task1, task2, ...):
print("\n".join(task.get_output_paths()))
But well, I still like your idea as code-wise it doesn't introduce much complexity and their is not much harm in supporting that, even if there's other ways to achieve something similar, except your potential backwards compatibility issue. Just as a new admin here I try to be a bit strict ;)
I would say the backwards compatibility issue is minor.
Same comment from my side. I highly doubt there are users with a dry_run
method and if this is the case, they probably want to achieve exactly that. I would also not add a command line argument for this - renaming functions in python can be done with any reasonable IDE today ;-)
Concerning the more general question of whether to support such functionality at all: I think it might make sense to have a move custom dry-run functionality, especially if one writes "library-like" b2luigi functions (e.g. classes, which are then derived again for solving the real task). But, Michael is the new author ;-)
Just two comments to the code:
I agree with Nils' code comments above. I assumed the print("call: ...")
was just a line added for testing/debugging but didn't comment on that yet because this is marked as a WIP/draft PR.
I don't see an obvious way for the generic dry-run functionality, i.e. I'd have to think about it . I'd be fine with merging this as is (after implementing Nils' comments) and have an issue or separate PR for the generic functionality.
Thank you for the comments. Yes indeed, I will add this dry_run
call feature to the documentation and remove the dashes in the output, and create a real PR and also an issue why this nice feature is missing :)
Your latest changes look good to me, I did a test-build of the updated docs and it looks good. Thanks for contributing :smiley: .
Since this is not urgent I am waiting just in case @nils-braun has something to say. Also I think it would be useful if he could explain in a bit more detail what he means by the general version of the dry-run feature, since I am not sure I get what he exactly has in mind with that. But thanks Anselm for creating #71 for that.
Small tip: In the future don't use commit messages like "adopted comments", instead the commit messages should say what was changed in the subject (first line) and the why should be addressed in the body of the commit message, that is everything after an empty line, see e.g. Chris Beams: How to write a git commit message.
Sorry for that trivial commit message. I just wanted to wrap the things up fast, so the small issue can be closed and would not consume too much precious time of the maintainer. After all, everything is already very well documented in this PR. ^^
Also fine with the changes! Concerning the "general version of the dry-run feature": I imagine (but I also might be wrong), that b2luigi is used at Belle (II) often in a "library" mode: some people (like you two) write tasks and functionality using b2luigi - others use it (and just run it or start building on top if it). That means, very often you need to use and execute code, which is not your own. Of course, you can read the code - but we all know that that will not happen in all cases ;-) Therefore I think it makes sense to provide such functionality, because for your "workaround", @meliache , you would need to know which parameters of the task classes to check (and print) and what they mean (which you do not need to know if you just call a dry-run function where the original author has decided on what is important to print)
I've implemented a small feature when the
dry-run
run mode was chosen, which then calls thedry_run()
method of not finished tasks. This allows an easy and fast way to implement e.g. file output debugging. One just put some print statement with path information in adry_run
method and execute indry-run
mode and has the results one is looking for.Example:
adding method to a child of a
Task
objectoutput of
dry-run
mode without a dry_run class:output of
dry-run
mode with a dry_run method:There is a point about backward compatibility, I am not sure how to handle it. What happens with old code having a
dry_run
method implemented and which should not be executed indry-run
mode. So we should provide some mechanism to suppress the call. But I'm not sure where to place this. Any suggestions?