nils-braun / b2luigi

Task scheduling and batch running for basf2 jobs made simple
GNU General Public License v3.0
17 stars 11 forks source link

Fix queries for gbasf2 grid files with multiple extensions #181

Closed meliache closed 1 year ago

meliache commented 1 year ago

Hotfix that hopefully resolves issue #180, if I understood the behavior of gbasf2 correctly. Currently I don't have time and no access to the grid. @schmitca, could you please check if this works?

@schmitca For installing this branch, you can use

pip3 install --upgrade "git+https://github.com/nils-braun/b2luigi#bugfix/gbasf2-correctly-handle-multi-extension-outputs" 

Resolves #180

codecov-commenter commented 1 year ago

Codecov Report

Base: 58.15% // Head: 59.16% // Increases project coverage by +1.01% :tada:

Coverage data is based on head (82411af) compared to base (910cf36). Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #181 +/- ## ========================================== + Coverage 58.15% 59.16% +1.01% ========================================== Files 22 23 +1 Lines 1503 1555 +52 ========================================== + Hits 874 920 +46 - Misses 629 635 +6 ``` | [Impacted Files](https://codecov.io/gh/nils-braun/b2luigi/pull/181?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Nils+Braun) | Coverage Δ | | |---|---|---| | [b2luigi/batch/processes/gbasf2.py](https://codecov.io/gh/nils-braun/b2luigi/pull/181?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Nils+Braun#diff-YjJsdWlnaS9iYXRjaC9wcm9jZXNzZXMvZ2Jhc2YyLnB5) | `46.13% <0.00%> (ø)` | | | [b2luigi/\_\_init\_\_.py](https://codecov.io/gh/nils-braun/b2luigi/pull/181?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Nils+Braun#diff-YjJsdWlnaS9fX2luaXRfXy5weQ==) | `88.46% <0.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Nils+Braun). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Nils+Braun)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

schmitca commented 1 year ago

Dear @meliache, I am happy to test it. At the moment, the grid jobs with udst.root output fail, before download can be attempted. This is because running the pickled basf2 path with udst.add_skimmed_udst_output(skimDecayMode='skimname') fails with

[FATAL] identical from/to parameter value   { module: SwitchDataStore ('' -> 'skim_skimname') function: virtual void Belle2::SwitchDataStoreModule::initialize() }
[INFO] ===Error Summary================================================================
[FATAL] identical from/to parameter value   { module: SwitchDataStore ('' -> 'skim_skimname') }

This can be traced back to the fact that basf2.pickle_path.deserialize_path( basf2.pickle_path.serialize_path(mypath)) != mypath I will try to find/contact an expert for basf2.pickle_path.

meliache commented 1 year ago

Dear @meliache, I am happy to test it. At the moment, the grid jobs with udst.root output fail, before download can be attempted. This is because running the pickled basf2 path with udst.add_skimmed_udst_output(skimDecayMode='skimname') fails

This seems to be a separate issue and I'm afraid if we need to add a workaround for that, that would be quite involved and probably should be a separate pull request, but not sure if I have time for that.

This can be traced back to the fact that basf2.pickle_path.deserialize_path( basf2.pickle_path.serialize_path(mypath)) != mypath I will try to find/contact an expert for basf2.pickle_path.

Honestly, good luck with that, I don't expect that to be simple. I think (though don't quote me on that) that Thomas Keck had contributed to the pickle path functions because he needed it for the FEI. Martin Ritter also new about them. But those people left the collaboration alreayd long ago. With those things you are usually more succesful by trying to dig into the code yourself and maybe even contribute.

I just checked the definition of udst.add_skimmed_udst_output and it seems that it creates an additional conditional path

    skim_path = basf2.create_path()
    [...]
    filter_path.add_independent_path(skim_path, "skim_" + skimDecayMode)
    add_udst_output(
        skim_path,
        outputFile,
        saveParticleLists,
        additionalBranches,
        dataDescription=dataDescription,
        mc=mc,
    )

I don't understand conditional paths super well, but it's well possible that serialize_path has problems with that.

basf2.pickle_path.deserialize_path( basf2.pickle_path.serialize_path(mypath)) != mypath You could check really what's the difference by eye or looking at the diff of basf2.print_path. Sadly I don't think I have the time to dig into thhat.

For testing this pull request I would try just using udst.add_udst_output, or also just add dots to a root file name, like foo.bar.root. And try figure out the conditional paths seperately

Update: I now created #182 for the pickling issue.

schmitca commented 1 year ago

I can confirm that the download is working with udst.add_udst_output and

--- a/b2luigi/batch/processes/gbasf2.py
+++ b/b2luigi/batch/processes/gbasf2.py
-        dataset_query_string = f"{output_lpn_dir}/{self.gbasf2_project_name}/sub*/{output_file_stem}_*{output_file_extensions}"
+        dataset_query_string = '.'.join([f'{output_lpn_dir}/{self.gbasf2_project_name}/sub*/{output_file_stem}_*', output_file_extensions])

It seems that I cannot push my changes, but this can be commited and closed.

meliache commented 1 year ago

I can confirm that the download is working with udst.add_udst_output and

Nice :+1:

It seems that I cannot push my changes, but this can be commited and closed.

Thanks for the patch. I added a commit with your line and added you (name and email) as the author of the commit. Git allows after all having separate authors and commiters for commits (which is used e.g. in the linux kernel where patch emails are still used). But I didn't push the commit yet because I'd like to get your permission to use your full name and email, which I found on b2mms. You might want to disagree for privacy purposes, then I'd change the commit author to your github nickname. Also, if you don't use your LMU email on github, it would be better adding your github email, so that github recognizes you as the author of the commit, even when I pushed it.

Btw, github has a feature where when the reviewer comments on a line of code, they can add a code suggestion and I could then accept this suggestion. But I don't like edits via github much, because editing in a proper editor/IDE results in better style you get syntax/style checking if you use a modern editor/IDE, which is not the case in github UI edits.

schmitca commented 1 year ago

Thank you @meliache, I added a suggestion.

meliache commented 1 year ago

@schmitca thanks for doing that so late. Can I use your full name from B2MMS in the contriburs list in the documentation?

schmitca commented 1 year ago

Sure, thank you! @meliache

meliache commented 1 year ago

@schmitca I now published this to PyPi as release 0.8.2, so if you install b2luigi with pip install b2luigi --upgrade, you should have the functionality available immediately.