nils-braun / b2luigi

Task scheduling and batch running for basf2 jobs made simple
GNU General Public License v3.0
17 stars 11 forks source link

Several improvements of gbasf2 handling #91

Closed ArturAkh closed 3 years ago

ArturAkh commented 3 years ago

Following options of using gbasf2 batch processing are improved/added:

Edit 19.04.2021:

codecov-commenter commented 3 years ago

Codecov Report

Merging #91 (b2130c7) into main (b1f1e8b) will increase coverage by 1.92%. The diff coverage is 48.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #91      +/-   ##
==========================================
+ Coverage   54.27%   56.19%   +1.92%     
==========================================
  Files          23       23              
  Lines        1404     1436      +32     
==========================================
+ Hits          762      807      +45     
+ Misses        642      629      -13     
Impacted Files Coverage Δ
b2luigi/batch/processes/gbasf2.py 39.47% <48.00%> (+9.30%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b1f1e8b...b2130c7. Read the comment docs.

meliache commented 3 years ago

Thanks for the PR. Tbh in the future I would prefer one PR per feature, so it's easier to discuss them separately, even if it means doinng 4 small PRs at once, this makes labelling the PR more difficult and also kind of difficult to argue about it, e.g. some fixes like the warning bugfix I might merge right away, but other issues might need discussion or I might want to reject them.

  • improved downloading of datasets: in case of failed downloads only the ones which are failed, are downloaded, based on a collection of LFNs from created from gb2_ds_get stdout.

I'm not a fan of this.

  1. Isn't that the existing behaviour of gbasf2 anyway? Since #67 we keep the download directory if a download failed and as far as I know, if gbasf2 sees that a local download directory already exists, it already does the checks that you implemented and only downloads the files that need to be downloaded. However, I haven't tested this much so if it turns out that with this gbasf2 wrapper every time all files are re-downloaded, it means there is some other bug and we have to investigate it, so place notify me.
  2. Your solution requires parsing the output of gbasf2 commands, which we do a lot in other places, but I'm not very happy about this, since it can (and had) result in breakage between gbasf2 releases when the command output changes. For example, I know that the gbasf2 team is working on moving error output from stdout to stderr (I asked them to do that).
ArturAkh commented 3 years ago

@meliache Thanks a lot for the review! And sorry for having everything in one commit, I'm essentially working on implementations & testing everything at the same time, therefore the bunch of changes in one go.

Concerning the commit with improving the dataset download:

The problem is, that although gb2_ds_get only downloads the required files, it will check the locally available files in the full directory. This means if you have 500 outputs to be downloaded, and one of them has failed, you effectively download them all again - since the checks of the the local & remote files being identical last 5 - 10 seconds, while the download itself is in the same ballpark for small output files.

That being said, I'm in favour of a more risky, but faster solution. If the syntax parsing breaks, then I just suggest to quickly fix it.

meliache commented 3 years ago

We should add some unit tests to make codecov happy and to get a handle on the increased complexity. I regret I haven't done this earlier and thus we sit on some technical debt, but now I got some experience with it. You can check out d6a391db33e07c9c946fc5e9a92a2e11db175e4f to see how one can add unit tests for the Gbasf2Process. I can help with that but I'm quite busy at the moment so it might take a while. For the following functions I have already ideas how tests might look and implementing them might be a start:

ArturAkh commented 3 years ago

@meliache Thanks for the suggestions! I'll try to implement the suggested code changes within the next days.

Concerning unit tests: I'm not experienced in that at all, so it will take some time to do so for me too ^^ The best test for me currently is running the tool for a real-world application I'm currently working on :D

But ok, I understand, that such tests are useful, fair enough.

However, I still think, also as a user, one should have a look at code and see whether it does what it should, judging based on job outputs one expects from ones application. In the end, this is a tool for physicists and not for purposes like luigi's main application for spotify.

Also, we do not have a large enough community to support such things and may perhaps think of investing time instead into unifying efforts, e.g. with https://github.com/riga/law which is also luigi-based, to profit from a larger community support and not to invent things twice (like e.g. HTCondor backend).

That being said, I'll think of unit tests & try to implement them, but this would be also not my highest priority.

meliache commented 3 years ago

Tipp: I saw you have five style-fix PR's, I guess this is because we recently added pre-commit to our repository to check for pep8-compliancy. I recommend you to install pre-commit locally, so that you get automatic style checks before each commit before even pushing, this would solve this problem. I have added documtation how to do this to our development docs. Or make sure that your IDE or editor automatically checks your code (or use flake8 from the command line if you are oldschool).

meliache commented 3 years ago

I just saw your first reply

Concerning unit tests: I'm not experienced in that at all, so it will take some time to do so for me too ^^

Well okay as I said I can help with that, the task was more a note to myself and I can start working on this. One of the reasons I want to put more tests in place is for the very reason that I don't have much time and with tests it will be easier to review PR's which change the internal logic. So for testing units of internal logic I find them very useful, for things that depend on a batch system being available, manual test are still important.

Also, we do not have a large enough community to support such things

What do you mean by such things? Do you mean unit tests? b2luigi had unit tests from the very beginning when it was a single-person project. I don't see how the usefulness or difficulty of unit tests is correlated with the number of developers. I see them as a tool to ensure code-quality and correctness and not as a feature that requires work to be supported. But I'm not sure if I understood you correctly.

In the end, this is a tool for physicists and not for purposes like luigi's main application for spotify.

Despite the name b2luigi can be useful too more people than just Belle II member or even physicists, which is why I think the (docs)[https://b2luigi.readthedocs.io/en/stable/#why-not-use-the-already-created-batch-tasks] mention AWS and Azure. These batch systems should be rather easy to implement by as user, it's just that no b2luigi users used them yet. I remember that @nils-braun considered that we might merge some features back to luigi in the future, but you're right that our user-base as of know are mostly physicists. But I think even physicists are happy about correct code.

and may perhaps think of investing time instead into unifying efforts, e.g. with https://github.com/riga/law which is also luigi-based, to profit from a larger community support and not to invent things twice (like e.g. HTCondor backend).

I heard about LAW, but haven't tried using it yet. The design seems very different. With b2luigi you can write ordinary luigi tasks which inherit from luigi.Task, so people who used pure luigi before can easily switch. And you can process your ordinary luigi tasks on any supported batch system that you want by using b2luigi.process (the gbasf2 batch system is an exception in that it can only run Basf2PathTasks). Law seems to require that you inherit from a special workflow class (e.g. inherit from HTCondorWorkflow for htcondor). I don't now if LAW addresses the potential issues with alternative implementation mentioned under Why not use the already created batch tasks?. LAW seems to have a large user-base at LHC but as I see it is also maintained mostly by a single developer. I wouldn't be against having a chat and see where we might profit from each other. Do you know any LAW users?

ArturAkh commented 3 years ago

@meliache thanks for your reply! And sorry for being a bit intriguing on creating unit tests ;) In the following my answers:

1) Yes, as such unit tests help to improve the code correctness and catch pitfalls of - I would say - simple use-cases. But in the end - and sorry, this is just my subjective opinion ^^ - a manual, realistic tests are usually more important since I think they cover both problems of internal logic and other things like performance measures etc.

2) Yes with such things I meant unit tests ^^. It is fine with me to follow the implementation rules you as developers imposed on this repository.

Just as a remark: Impatient users will branch off at some point with their own forks & developments, if implementation of unit tests slows down the procedure of developments itself. And this behaviour should be something that is reduced, since that will lead to losses of developments. On the other hand, I agree that unit testing should be part of code development - but usually for bigger projects, where it is impossible to keep track of all code changes.

That is actually what happened with https://github.com/grid-control/grid-control, being a nice, but somewhat complicated tool for direct job submission (without possiblity to build up nested workflows, but really tracking down each single job & able to adapt it). The tool is not supported any more much, since the main developer(s) are in industry now, and everyone who needed some tweaks of the tool, which were not elegant/ not well tested enough to enter the main repo, branched off...

3) In case people outside the Belle 2 community should be able to use the tool, "b2luigi" as name is unlucky, I would say - also confirmed by computing people from ETP at KIT ;). This implies, that it is designed for being Belle 2 specific only... Might be one of the reasons, why people outside Belle 2 don't use it much yet.

I agree, that physicists would be happy about correct and well tested code, but they would be even happier getting things done quickly :D And whether the former implies the latter and vice versa, depends on the tool support, tool usage, and how long the tool lives and is up-to-date. Software frameworks come and go, and they go sooner, if they are not used widely.

4) Yes, law essentially has a very different design compared to b2luigi: so a higher-level tool on top of luigi (law) vs. natural extension of luigi (b2luigi), but it should cover all the use-cases you mention, but I'm not sure, whether it is done more or less efficiently in one or the other tool.

In any case, I'd be happy to manage & organize a kick-off of people working on law & b2luigi to brainstorm together, if this is possible. At least on the user-side I know some people at ETP using law, there should be people at RWTH Aachen and also some users at CERN - I know mostly the ones from CMS.

And I'm not saying that law is better - I don't know the tool much yet to be able to judge. Currently, I know most probably more about b2luigi than law :D It's just better avoid things being implemented twice, and there is a feature overlap between the two tools, since both luigi-based, and both cover similar batch systems (apart from experiment-specific ones).

Anyway, thanks for the discussion and sorry again for being subjective on unit tests :P You see, I'm not a fan of them, but I may get used ^^. While spending time on writing this message I could have used it to implement unit tests, actually :D

meliache commented 3 years ago

I added some unit tests for the get_job_status() function in the branch feature/add_gbasf2_job_status_tests (commit c151963bfbf065fa8a65397b, you can see the tests here). I somehow can't directly push my changes to the PR. @ArturAkh, can you please merge them into your branch and push again.

git merge origin feature/add_gbasf2_job_status_tests

Here origin should be the remote name of the official b2luigi repository. If for you origin points to your fork, you can use:

git remote add upstream https://github.com/nils-braun/b2luigi.git
git merge upstream feature/add_gbasf2_job_status_tests

You can run the test in the topmost b2luigi directory with

python3 -m unittest tests/batch/test_gbasf2_process.py

I won't ask you to do any more test :slightly_smiling_face:. This was on my wishlist for a long time since I think in that part of the code something can easily go wrong. And for other parts of the code I might add tests incrementally when I work on it.

I changed the codecov configuration to use main as the default branch (before it was master?) and now somehow this PR is green even without adding any further tests :man_shrugging:.

Some explenation what I did: I used the existing script gbasf2_job_status.py to create some job-status json files for different gbasf2 projects (needs to be run in a gbasf2 environment). I even one project of yours, just de-anonymized the user-name. Then I used mock.patch to change the function get_gbasf2_project_job_status_dict to return the dictionary defined by a specific job status json file and then try to get the job status for that and compare that to what we expect. It's not that complicated if you look at the code. Only for the test where all jobs are Done but the ApplicationStatus is not Done, I didn't use an existing real project but I just manually changed the json of the Done project.

ArturAkh commented 3 years ago

@meliache Ok, done! Thanks for giving that examples for testing and the explation for get_job_status. I'll try to understand it a bit better and may be come up with questions.

I'll try to come up with a test for self.test _reschedule_failed_jobs() at some point this week, in case I find enough time :)

meliache commented 3 years ago

I see the test for self.test _reschedule_failed_jobs() is kind of optional, do it if you want to train writing tests and learn how to use mock objects. One issue I had in the past is how to test the methods without instantiating the class but then I learned I can use mock objects to replace self (just look at the examples).

The thing that is preventing me from merging the most is the improved download handling which is still something that I think gbasf2 does anywa, I just tested it by downloading a dataset with gb2_ds_get, deleting some files and downloading again, files that are already downloaded are skipped and I would expect the same to happen when calling the function from b2luigi:

❯ gb2_ds_get "/belle/user/meliache/test_new_gb2_10/sub00/*"
/nfs/dust/belle2/user/meliache/test_new_gb2_10/sub00 already exists
How would you like to verify the files? By file size(s) or by file checksum(c)?:
Please type [s] or [c]: c
Files to download to /nfs/dust/belle2/user/meliache/test_new_gb2_10/sub00 : 4 file(s)
Do you want to download files:
Please type [Y] or [N]:
Found valid local file for /belle/user/meliache/test_new_gb2_10/sub00/D_ntuple_00001_job180945538_00.root [Checksum matched: 41c156f0]
Skip to download the file...

I want to keep the code as short as possible and if some functionality is already provided, I wouldn't duplicate it. However, maybe I misunderstand something which you were trying to achieve...

ArturAkh commented 3 years ago

@meliache Exactly. And this skipping takes time (in the order of 5 seconds). It is fine, if you have only a few outputs. But it gets annoying, in case you have 500 outputs or even more, (had already cases with 5k outputs to be downloaded). And you certainly do not want to wait for a single failed download again 5 * 5k seconds...

That's the main reason for this part

meliache commented 3 years ago

@ArturAkh This is a convincing argument that you are giving, yeah, this is a very good argument :+1: . I always wondered why gbasf2 is so slow with this.

What's important to me is that the download can handle situations nicely when the download/b2luigi task was aborted due to other unpredicted reasons, e.g. a user abort with Ctrl+C. In an ideal world one should be able to restart the task, e.g. python3 luigi_skript.py and then the download should resume where it last stopped. I don't have the exact insight yet how this would be handled with your fixes, but I trust you and this is why I remind you of that.

Anyway, with this new gbasf2 features I think we should make a new release soon.

ArturAkh commented 3 years ago

@meliache Sounds good!

I would expect it to still work.

Depending, whether the failed_files.txt list is created or not, it would perform the "improved" or not improved download.

If not, it will of course resume the standard gb2_ds_get with skipping ---> fair enough, if the user breaks the process with "Ctrl + C" before failed_files list creation. ---> Maybe downloaded files were, corrupted or something.

However, I would need to tesk that properly with my usecase and whether the parsing works as expected before we merge that. So I'll inform you after having had a test with failed files. I will most probably also try to create a unit test for that, whether the stdout of gb2_ds_get is parsed correctly - it is important that it works.

ArturAkh commented 3 years ago

@meliache I've added a test for parsing of download command, using some example outputs stored in the _gbasf2_project_download_stdouts folder. Could you please have a look at it, whether it covers the relevant code properly?

At least according to the test job on github, the test functions are passed.

ArturAkh commented 3 years ago

I'll proceed then with the unit test(s) for self._reschedule_jobs

meliache commented 3 years ago

Looks good to me and tests on github pass, I had to click an approval button first to run the tests for the PR are run.. This is a new feature for new contributors because github is fighting with cryptominers mining bitcoins via the github action CI. Anyway, tell me when your are happy with the PR, like any more unit tests are nice to have but I would even merge without if you think it's too difficult.

If you want to go forward, I would have probably tried to use mock functions again for testing the rescheduling. You can replace a function with a mock function that does nothing but just records which arguments it is given etc. See the mock getting started guide, but maybe you'll figure out on your own.

ArturAkh commented 3 years ago

@meliache,

Thanks for having a look at my first unit test :)

After having a look at your suggestion above for self._reschedule_failed_jobs and the example testing the job status I guess I have a rough idea how to write a test for self._reschedule_failed_jobs. I'll give it definitely a try tomorrow and - if needed - the day after and inform you again.

ArturAkh commented 3 years ago

@meliache,

After a few trial & errors, I've finally managed to write a test for self._reschedule_failed_files, which is also running through the CI action.

Please have a look at the following lines, whether they are ok for a proper test: test_gbasf2_process.py#L66-L113

If you think the test is fine as it is, we can merge the pull request, since it is ready from my side

meliache commented 3 years ago

Looks good to me, very good :+1: . For the unit-tests, I'm not as strict when it comes to style and even some boiler-plate is okay. I think we could possibly make the setUp method shorter or define it in a new superclass from which the others inherit, but it doesn't matter much. I just found that there is that function unittest.assertCountEqual which compares to sequences independent of the order of the elements, so you don't need to sort them and I quickly added a commit that uses that. I'll merge soon. Maybe I'll bump the version number, too.

ArturAkh commented 3 years ago

@meliache cool, sounds good!

Yeah, this assertCountEqual is a more convenient choice :)

It's also fine with me to increase the version number. However, please keep in mind, that after that pull request, I'll have another one (based on the new main branch), that adds an example for FEI with gbasf2 using b2luigi :)

So we may increase the number after merging in that example.