scikit-hep / fastjet

Jet-finding in the Scikit-HEP ecosystem.
https://fastjet.readthedocs.io
BSD 3-Clause "New" or "Revised" License
21 stars 14 forks source link

feat: Addition of SoftDrop from the RecursiveTools Contrib #235

Closed cmoore24-24 closed 1 year ago

cmoore24-24 commented 1 year ago

This PR includes the addition of the SoftDrop grooming tool from the RecursiveTools section of fastjet-contrib. The defaults are set to those used in CMS and/or the defaults recommended by the authors of SoftDrop. That said, the options for both the SymmetryCut and RecursionChoice fields have been included. At the moment, the soft drop grooming outputs a record array of the jet constituents, as well as a few values of interest of the jet itself.

As a small change, EnergyCorrelator has been slightly adjusted, so that the default invocation of the generic ECF class will return the normalized values. The unnormalized values can be returned with a "noramlized=False" argument.

lgray commented 1 year ago

@cmoore24-24 can you run pre-commit on this locally and push since somehow your branch protection rules forbid pre-commit.ci from editting this branch?

jpivarski commented 1 year ago

Just a suggestion: instead of returning a set of filtered jets, you could return an array of booleans. On the Python side, you can use the booleans to slice the original jets and get the filtered jets, but you can also use those booleans for other things. You could, for instance, apply logical operators on groomings with different parameters, to see which jets would be groomed with one set of parameters and not another. That would be harder if you had to identify jets in the output set that are "the same jet."

Also if you do input jets → grooming algorithm → boolean array → slice → filtered jets, the filtered jets share memory with the input jets. It's a view, rather than a copy.

lgray commented 1 year ago

This is a very good suggestion - especial for environments like HL-LHC. @cmoore24-24 do you see how to do this?

cmoore24-24 commented 1 year ago

Is there room for both ideas, potentially? I can definitely see how returning booleans rather than the filtered jets can be helpful when memory is at a premium, but having fastjet do some of the calculation legwork w/r/t the jets is also pretty useful. So maybe keep this implementation as-is, and then down the line have something like exclusive_jets_softdrop_grooming_slimmed or some such that returns the booleans?

As for how to do it, no I'm not really sure off the top of my head. When I have the time to take another look, I can try to figure it out, though.

lgray commented 1 year ago

You could have one additional kwarg that's "return_as_mask" or have two functions exclusive_jets_softdrop_grooming and exclusive_jets_softdrop_grooming_mask depending on what people need.

cmoore24-24 commented 1 year ago

I've added a way to return a boolean mask that can be applied to a PF candidates array to return filtered jets. As for the implementation; I may be wrong, but I don't believe there's an easy way to make softdrop preserve anything like candidate ID or index (though if there is and I missed it, then feel free to let me know). So the only way I could think to do this was explicit 4 vector matching via nested loops.

for (unsigned int i = 0; i < css.size(); i++){  // iterate through events
          auto jets = css[i]->exclusive_jets(n_jets);
          for (unsigned int j = 0; j < jets.size(); j++){
            auto soft = sd->result(jets[j]);
            nconstituents.push_back(jets[j].constituents().size());
            for (unsigned int k = 0; k < jets[j].constituents().size(); k++){
              for (unsigned int l = 0; l < soft.constituents().size(); l++){
                if (jets[j].constituents()[k].px() == soft.constituents()[l].px() &&
                    jets[j].constituents()[k].py() == soft.constituents()[l].py() &&
                    jets[j].constituents()[k].pz() == soft.constituents()[l].pz() &&
                    jets[j].constituents()[k].E() == soft.constituents()[l].E()){
                  bool_mask.push_back(true);
                  break;
                }
                else if (l == soft.constituents().size()-1) {
                  bool_mask.push_back(false);
                }
              }
            }
          }
        }

As a result, it isn't a terribly quick process, but it does work and produce the desired boolean array.

lgray commented 1 year ago

We need to do better than quadruple nested loop, this will die in bad ways for HL-LHC analyses. I'll think about a way forward. Would not merge as is.

cmoore24-24 commented 1 year ago

We need to do better than quadruple nested loop, this will die in bad ways for HL-LHC analyses. I'll think about a way forward. Would not merge as is.

At this point would it be worthwhile reverting this merge to the pre-mask state and making the addition of the mask a separate PR?

cmoore24-24 commented 1 year ago

Hello,

I have removed the attempt at adding a mask option for softdrop. That work can serve as a starting point for a separate pull request that adds a mask to softdrop at a later date. The PR is now in its initial state; it returns filtered jet constituents, as well as some attributes of the filtered jets themselves.

@jmduarte, do you have any comments on the PR as it is now? Can it be merged as is, or are there changes that need to be made?

Best, Connor