scikit-hep / decaylanguage

Package to parse decay files, describe and convert particle decays between digital representations.
https://decaylanguage.readthedocs.io
BSD 3-Clause "New" or "Revised" License
42 stars 16 forks source link

Support for decay descriptors #200

Open admorris opened 2 years ago

admorris commented 2 years ago

As part of the LHCb Ntuple Wizard project, we have implemented decay descriptor parsing (and rendering) using pyparsing. I would like to port this over to decaylanguage and make it easy to add different "grammars" used by different experiments/software packages.

Example grammars:

We can also think about doing matching and substitution (which we also have functionality for in the Ntuple Wizard) but there it's less clear to me how easy it is to support different conventions from different experiments/packages.

eduardo-rodrigues commented 2 years ago

Thanks @admorris for creating this work task / enhancement idea following our chats within LHCb. Thanks also for the links. I was not aware of all of those and this is bound to be interested and relevant.

You know already that I like the idea of the extension since specifications of decays such as B0 -> pi+ pi- make sense and are very much used in Flavour Physics experiments. We should try and provide at least a basis "for everyone", as generic as possible. Once we integrate with the rest then all package functionality can be exploited for free ...

On the spot I would say that it would make sense to draft something say as a descriptor/ submodule under https://github.com/scikit-hep/decaylanguage/tree/master/src/decaylanguage, in the same way that we have dec/ to deal with .dec decay files and decay to define decay chain classes.

@henryiii, I hope you also like these ideas :-).

We should try and see if Belle-II colleagues could be interested in this. Let me start by pinging usual suspects such as @daritter and @GiacomoXT (feel free to ping other colleagues).

eduardo-rodrigues commented 2 years ago

We can also think about doing matching and substitution (which we also have functionality for in the Ntuple Wizard) but there it's less clear to me how easy it is to support different conventions from different experiments/packages.

This is indeed a bit trickier. Depends on how much can be generic and how much needs to be experiment specific. Else we can have things defined via a sort of "backend", and using for example the "LHCb backend" would bring in some functionality not necessarily available generically. Just thinking out loud here.

eduardo-rodrigues commented 2 years ago

Hi @admorris, shall we try and move forward with this nice addition? Such a module will be super useful also for run-3 DaVinci, hence I'm very keen. Let me know whether you will have a bit of time for this. I'm happy to help.

admorris commented 1 year ago

In #331 I add a function that writes "LHCb-style" decay descriptors e.g. D*+ -> (D0 -> K- pi+) pi+

I identify 3 things that could be made configurable:

admorris commented 1 year ago

I guess to allow for the most flexibility, this could be achieved with some kind of global configuration where the user specifies format-strings with named variables:

# LHCb LoKi style
descriptor_config = {
    "decay_pattern": "{mother} -> {daughters}",
    "sub_decay_pattern": "({mother} -> {daughters})",
}
# Belle DecayString style 
descriptor_config = {
    "decay_pattern": "{mother} -> {daughters}",
    "sub_decay_pattern": "[{mother} -> {daughters}]",
}
# Some other style 
descriptor_config = {
    "decay_pattern": "{mother} --> {daughters}",
    "sub_decay_pattern": "{mother} (--> {daughters})",
}
eduardo-rodrigues commented 1 year ago

Yes, sounds simple enough to me. Good idea!

admorris commented 1 year ago

The configuration is implemented in https://github.com/scikit-hep/decaylanguage/pull/331/commits/abc1bfd618e7160163eea047fdc8bab611b51c60

I chose to make a class with __enter__ and __exit__ methods so you can do with statements:

    >>> with DescriptorFormat("{mother} --> {daughters}", "[{mother} --> {daughters}]"): dc.to_string()
    ...
    'D*+ --> [D0 --> [K_S0 --> pi+ pi-] [pi0 --> gamma gamma]] pi+'
    >>> with DescriptorFormat("{mother} => {daughters}", "{mother} (=> {daughters})"): dc.to_string();
    ...
    'D*+ => D0 (=> K_S0 (=> pi+ pi-) pi0 (=> gamma gamma)) pi+'
    >>> dc.to_string()
    'D*+ -> (D0 -> (K_S0 -> pi+ pi-) (pi0 -> gamma gamma)) pi+'

Or call DescriptorFormat.set_config directly (which checks that the provided patterns contain the correct named-wildcards)

eduardo-rodrigues commented 1 year ago

Very nice, thank you!

ryuwd commented 1 year ago

So, is this completed?

eduardo-rodrigues commented 1 year ago

Hi @ryuwd from across an LHCb-internal discussion where this is relevant ;-).

The work (modulo little enhancements possible) is done in one direction but the reverse-problem implementation is missing. For an example from a "DecayLanguage" decay chain to a descriptor see https://github.com/scikit-hep/decaylanguage/blob/master/tests/utils/test_utilities.py. What we now miss, handy for LHCb, is that you pick up an arbitrarily complex descriptor and spit out the decaying particle and chains with subdecays. As said, a lot already exists in the LHCb Ntuple Wizard but needs some work to be exported out to everyone.

[As you see I continue developing stuff but have limited bandwidth ;-) @admorris made a significant contribution in these matters 👍 !]