umami-hep / atlas-ftag-tools

Python tools for ATLAS FTAG
2 stars 17 forks source link

Added optional _px variable in flavours for ghost labelled jets #94

Closed WeiShengL closed 1 month ago

WeiShengL commented 2 months ago

Summary

This pull request introduces the following changes

Conformity

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 98.12%. Comparing base (9b03f81) to head (b199b9b). Report is 1 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #94 +/- ## ======================================= Coverage 98.12% 98.12% ======================================= Files 39 39 Lines 2289 2295 +6 ======================================= + Hits 2246 2252 +6 Misses 43 43 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

samvanstroud commented 1 month ago

Thanks for this @WeiShengL! I think it probably makes sense to merge this for now, but in the longer term I think we should require that different labels can map to the same string for probability outputs (e.g. bjets and ghostbjets would both be pb).

I can think of two ways to do this:

  1. Define different categories as nested lists rather than flat lists, so that the label is fully specified by category.label_name rather than just label_name. this would mean label_name no longer needs to be unique
  2. Add a new property for the px explicitly, which could default to the current one if not specified. This is probably the simpler option for now.

What do you think about this approach? To me it feels a bit more sustainable than adding a bunch of discriminant functions

WeiShengL commented 1 month ago

Yea I agree it does not look good with the different discriminant functions.

For approach 1, do you mean having something like this in the flavours.yaml?

single-btag:
  - name: bjets
    label: $b$-jets
    cuts: ["HadronConeExclTruthLabelID == 5"]
    colour: tab:blue

single-btag-ghost:
  - name: bjets
    label: $b$-jets
    cuts: ["HadronGhostTruthLabelID == 5"]
    colour: tab:blue
samvanstroud commented 1 month ago

So we already have the label, I meant also to have a px or similar key, for e.g.

single-btag:
  - name: bjets
    label: $b$-jets
    px: b
    cuts: ["HadronConeExclTruthLabelID == 5"]
    colour: tab:blue

single-btag-ghost:
  - name: bjets
    label: $b$-jets
    px: b
    cuts: ["HadronGhostTruthLabelID == 5"]
    colour: tab:blue

It might be redundant to have both the label and the px, we could probably build the label from the px.

WeiShengL commented 1 month ago

We should probably be careful when we would like to merge this in so that people don't end up with different models with pb or pghostb. Unless it doesn't really matter?

samvanstroud commented 1 month ago

We should probably be careful when we would like to merge this in so that people don't end up with different models with pb or pghostb. Unless it doesn't really matter?

Yeah good point, maybe worth merging the ghost discriminant for now so people can make plots with the existing trainings... then we can open an issue to remember to remove it later :)

WeiShengL commented 1 month ago

I'll mark this as draft at the moment and created a PR #99 for the temporary ghost discriminant. Maybe we'll merge this in before preprocessing the samples for high stats training?

samvanstroud commented 1 month ago

Thanks! I'm happy to merge this now, any reason to keep it as draft?

WeiShengL commented 1 month ago

I think this does not go well with #99. Even if I have

get_discriminant(jets, tagger, signal=Flavour.ghostbjets)

It will look for pb here if we merge this in

samvanstroud commented 1 month ago

I see thanks, for now can we add a fallback if pb is not found to look for a string like this instead? https://github.com/umami-hep/atlas-ftag-tools/blob/main/ftag/flavour.py#L28

samvanstroud commented 1 month ago

Thanks Wei