mbarbetti/pidgan - Githubissues

GAN-based models to flash-simulate the LHCb PID detectors

What is PIDGAN?

PIDGAN is a Python package built upon TensorFlow 2 to provide ready-to-use implementations for several GAN algorithms (listed in this table). The package was originally designed to simplify the training and optimization of GAN-based models for the Particle Identification (PID) system of the LHCb experiment. Today, PIDGAN is a versatile package that can be employed in a wide range of High Energy Physics (HEP) applications and, in general, whenever one has anything to do with tabular data and aims to learn the conditional probability distributions of a set of target features. This package is one of the building blocks to define a Flash Simulation framework of the LHCb experiment [1].

PIDGAN is (almost) all you need (for flash-simulation)

Standard techniques for simulations consume tons of CPU hours in reproducing all the radiation-matter interactions occurring within a HEP detector when traversed by primary and secondary particles. Directly transforming generated particles into analysis-level objects allows Flash Simulation strategies to speed up significantly the simulation production, up to x1000 [1]. Such transformations can be defined by using Generative Adversarial Networks (GAN) [2] trained to take into account the kinematics of the traversing particles and the detection conditions (e.g., magnet polarity, occupancy).

GANs rely on the simultaneous (adversarial) training of two neural networks called generator and discriminator, whose competition ends once reached the Nash equilibrium. At this point, the generator can be used as simulator to generate new data according to the conditional probability distributions learned during the training [3]. By relying on the TensorFlow and Keras APIs, PIDGAN allows to define and train a GAN model with no more than 20 code lines.

from pidgan.players.generators import Generator
from pidgan.players.discriminators import Discriminator
from pidgan.algorithms import GAN

x = ... # conditions
y = ... # targets

G = Generator(
  output_dim=y.shape[1],
  latent_dim=64,
  output_activation="linear",
)

D = Discriminator(
  output_dim=1,
  output_activation="sigmoid",
)

model = GAN(generator=G, discriminator=D)
model.compile(
  metrics=["accuracy"],
  generator_optimizer="rmsprop",
  discriminator_optimizer="rmsprop",
)

model.fit(x, y, batch_size=256, epochs=100)

Installation guide

First steps

Before installing PIDGAN, we suggest preparing a fully operational TensorFlow installation by following the instructions described in the dedicated guide. If your device is equipped with one of the NVIDIA GPU cards supported by TensorFlow (see Hardware requirements), do not forget to verify the correct installation of the libraries for hardware acceleration by running:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

If the equipped GPU card is not included in the list printed by the previous command, your device and/or Python environment may be misconfigured. Please refer to this table for the correct configuration of CUDA Toolkit and cuDNN requested by the different TensorFlow versions.

How to install

PIDGAN has a minimal list of requirements:

Python >= 3.7, < 3.13
TensorFlow >= 2.8, < 2.17
scikit-learn >= 1.0, < 1.6
NumPy < 2.0
Hopaas client (https://github.com/landerlini/hopaas_client)

The easiest way to install PIDGAN is via pip:

pip install pidgan

In addition, since hopaas_client is not available on PyPI, you need to install it manually to unlock the complete set of PIDGAN functionalities:

pip install git+https://github.com/landerlini/hopaas_client

Optional dependencies

Standard HEP applications may need additional packages for data management, results visualization/validation, and model export. PIDGAN and any additional requirements potentially useful in HEP can be installed via pip in one shot:

pip install pidgan[hep]

Models available

The main components of PIDGAN are the algorithms and players modules that provide, respectively, implementations for several GAN algorithms and the so-called adversarial neural networks (e.g., generator, discriminator). The objects exposed by the algorithms and players modules are implemented by subclassing the Keras Model class and customizing the training procedure that is executed when one calls the fit() method. With PIDGAN v0.2.0 the package has been massively rewritten to be also compatible with the new multi-backend Keras 3. At the moment, the custom training procedures defined for the various GAN algorithms are only implemented for the TensorFlow backend, while relying also on the Pytorch and Jax backends is planned for a future release. The following tables report the complete set of algorithms and players classes currently available, together with a snapshot of their implementation details.

Generative Adversarial Networks

Algorithms*	Source	Avail	Test	Lipschitz**	Refs
GAN	`k2`/`k3`	✅	✅	❌	2, 10, 11
BceGAN	`k2`/`k3`	✅	✅	❌	4, 10, 11
LSGAN	`k2`/`k3`	✅	✅	❌	5, 10, 11
WGAN	`k2`/`k3`	✅	✅	✅	6, 11
WGAN-GP	`k2`/`k3`	✅	✅	✅	7, 11
CramerGAN	`k2`/`k3`	✅	✅	✅	8, 11
WGAN-ALP	`k2`/`k3`	✅	✅	✅	9, 11
BceGAN-GP	`k2`/`k3`	✅	✅	✅	4, 7, 11
BceGAN-ALP	`k2`/`k3`	✅	✅	✅	4, 9, 11

*each GAN algorithm is designed to operate taking conditions as input [3]

**the GAN training is regularized to ensure that the discriminator encodes a 1-Lipschitz function

Generators

Players	Source	Avail	Test	Skip conn	Refs
Generator	`k2`/`k3`	✅	✅	❌	2, 3
ResGenerator	`k2`/`k3`	✅	✅	✅	2, 3, 12

Discriminators

Players	Source	Avail	Test	Skip conn	Aux proc	Refs
Discriminator	`k2`/`k3`	✅	✅	❌	❌	2, 3, 11
ResDiscriminator	`k2`/`k3`	✅	✅	✅	❌	2, 3, 11, 12
AuxDiscriminator	`k2`/`k3`	✅	✅	✅	✅	2, 3, 11, 12, 13

Other players

Players	Source	Avail	Test	Skip conn	Aux proc	Multiclass
Classifier	`src`	✅	✅	❌	❌	❌
ResClassifier	`src`	✅	✅	✅	❌	❌
AuxClassifier	`src`	✅	✅	✅	✅	❌
MultiClassifier	`src`	✅	✅	❌	❌	✅
MultiResClassifier	`src`	✅	✅	✅	❌	✅
AuxMultiClassifier	`src`	✅	✅	✅	✅	✅

References

M. Barbetti, "The flash-simulation paradigm and its implementation based on Deep Generative Models for the LHCb experiment at CERN", PhD thesis, University of Firenze, 2024
I.J. Goodfellow et al., "Generative Adversarial Networks", arXiv:1406.2661
M. Mirza, S. Osindero, "Conditional Generative Adversarial Nets", arXiv:1411.1784
A. Radford, L. Metz, S. Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", arXiv:1511.06434
X. Mao et al., "Least Squares Generative Adversarial Networks", arXiv:1611.04076
M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein GAN", arXiv:1701.07875
I. Gulrajani et al., "Improved Training of Wasserstein GANs", arXiv:1704.00028
M.G. Bellemare et al., "The Cramer Distance as a Solution to Biased Wasserstein Gradients", arXiv:1705.10743
D. Terjék, "Adversarial Lipschitz Regularization", arXiv:1907.05681
M. Arjovsky, L. Bottou, "Towards Principled Methods for Training Generative Adversarial Networks", arXiv:1701.04862
T. Salimans et al., "Improved Techniques for Training GANs", arXiv:1606.03498
K. He et al., "Deep Residual Learning for Image Recognition", arXiv:1512.03385
A. Rogachev, F. Ratnikov, "GAN with an Auxiliary Regressor for the Fast Simulation of the Electromagnetic Calorimeter Response", arXiv:2207.06329

Credits

Most of the GAN algorithms are an evolution of what provided by the mbarbetti/tf-gen-models repository. The BceGAN model is freely inspired by the TensorFlow tutorial Deep Convolutional Generative Adversarial Network and the Keras tutorial Conditional GAN. The WGAN-ALP model is an adaptation of what provided by the dterjek/adversarial_lipschitz_regularization repository.

Citing PIDGAN

To cite this repository:

@software{pidgan:2023abc,
  author    = "Matteo Barbetti and Lucio Anderlini",
  title     = "{PIDGAN: GAN-based models to flash-simulate the LHCb PID detectors}",
  version   = "v0.2.0",
  url       = "https://github.com/mbarbetti/pidgan",
  doi       = "10.5281/zenodo.10463728",
  publisher = "Zenodo",
  year      = "2023",
}

In the above bibtex entry, the version number is intended to be that from pidgan/version.py, while the year corresponds to the project's open-source release.

License

PIDGAN has a GNU General Public License v3 (GPLv3), as found in the LICENSE file.