Welcome to Hm.JetscapeMl! This repository contains code and resources related to utilizing machine learning techniques for analyzing Jetscape simulation data.
Hm.JetscapeMl is designed to extract valuable insights and patterns from Jetscape simulation data using modern machine learning techniques. The dataset and accompanying scripts provide a comprehensive framework for conducting machine learning experiments on this data.
ML-JET, a dataset for parameter classification in heavy ion collisions using jet images. The dataset is hosted on Kaggle: ML-Jet Dataset (https://www.kaggle.com/datasets/haydarmehryar/ml-jet).
The JET-ML dataset is designed as a comprehensive benchmark for machine learning applications in the field of relativistic heavy ion collisions. This dataset facilitates the study and prediction of energy loss mechanisms in high-energy particle physics, specifically focusing on parameters like initial parton virtuality and strong coupling constant, denoted as $Q_0$ and $\alpha_s$, respectively.
Purpose and Scope
The primary aim of the JET-ML dataset is to support the development and evaluation of machine learning models for high energy physics that can classify and predict jet event parameters under different physical conditions in a quark-gluon plasma (QGP). It provides a rich collection of simulated jet images, which are pivotal in understanding the dynamics of parton energy loss in such environments. The dataset emphasizes the connection between energy loss and quantum chromodynamics (QCD) parameters, $Q_0$ and $\alpha_s$, which are critical for characterizing the scattering and splitting behavior of partons as they traverse the medium.
Data Generation and Features
The dataset was generated using the JETSCAPE framework (https://jetscape.org/), a sophisticated tool for simulating jet events in high-energy collisions.
Dataset Composition and Labeling
The JET-ML dataset comprises 10.8 million images, each with a resolution of 32 x 32pixels, representing Pb-Pb collision events. The jet observables used in our dataset building process are: (a) $p_T$: transverse momentum, (b) $\phi$: azimuthal angle, and (c) $\eta$: pseudorapidity of the emitted thermal particles. Each event has three coordinates, which are as follows:
Each image is labeled with its corresponding energy loss module (MATTER or MATTER-LBT), the strong coupling constant $\alpha_s$, and the virtuality separation scale $Q_0$. In image below, 10 sample events 2-D are demostrated with their related parameter.
Point cloud representation of a sample event is demostrated in image below:
Configurations (01 to 09): Nine distinct configurations corresponding to different combinations of physical parameters.
Strong Coupling Constant ($\alpha_s$): The simulations include $\alpha_s$ values of 0.2, 0.3, and 0.4.
Virtuality Separation Scale ($Q_0$): The dataset includes $Q_0$ values of 1, 1.5, 2.0, and 2.5.
Energy Loss Modules:
Dataset Size: varies indeffrent files. They contain 10.8 million, 1 million, 100k, 10k, 1k images of 32x32 pixel resolution.
Dataset Format:
DataColumn(name="dataset_x", description="32x32 pixel jet images.", data_type="image", shape=(32, 32)),
DataColumn(name="dataset_y", description="Associated labels including energy loss module, alpha_s, and Q_0.", data_type="numeric", shape=(3,)),
Intended Use and Applications
This dataset is intended for researchers and practitioners in both machine learning and high-energy physics. It provides a robust platform for developing models that can classify or predict event parameters in particle collisions, aiding in the deeper understanding of QGP properties and behavior. Possible applications include:
Compliance with FAIR Standards
The JET-ML dataset adheres to the principles of FAIR (Findable, Accessible, Interoperable, Reusable) data. It is publicly available through platforms like Kaggle (https://www.kaggle.com/datasets/haydarmehryar/ml-jet) and GitHub (https://github.com/hmehryar/Hm.JetscapeMl), with comprehensive documentation and metadata provided to facilitate its use and integration into various research workflows.
To get started with Hm.JetscapeMl, follow these steps:
git clone https://github.com/hmehryar/Hm.JetscapeMl.git
cd Hm.JetscapeMl
pickle
library:import pickle
dataset_file_name = f"ml_jet_dataset.pkl"
try:
with open(file_name, 'rb') as dataset_file:
loaded_data = pickle.load(dataset_file, encoding='latin1')
(dataset_x, dataset_y) = loaded_data
print("dataset_x:",type(dataset_x), dataset_x.size, dataset_x.shape)
print("dataset_y:",type(dataset_y), dataset_y.size,dataset_y.shape)
except pickle.UnpicklingError as e:
print("Error while loading the pickle file:", e)
All the step-by-step process/related codes for buidling the ML-JET Dataset can be found in jet_ml_dataset_builder Directory.
MNIST Net ~\cite{lecun1998gradient}, more commonly known as LeNet. It was initially devised for handwritten digit recognition, leverages insights into 2D shape invariances through local connection patterns and weight constraints. It uses an image input. With 4 layers, including convolutional and fully connected layers, MNIST Net boasts 96,445 trainable parameters. The model implementation can be found at MNIST Net Direcetory.
VGG16Net ~\cite{simonyan2014very}, renowned for its remarkable performance in image recognition tasks. It uses an image input and comprises 16 layers, with 4 convolutional and fully connected blocks, totaling 15,676,673 trainable parameters. The model implementation can be found at VGG16 Net Direcetory.
PointNet~\cite{qi2017pointnet} introduces a novel approach to processing point cloud data, making it uniquely suited for our jet event image classification task. Unlike conventional CNNs that operate on structured grid-like data, PointNet directly consumes unordered point sets. The model implementation can be found at Point Net Direcetory.
All following methods implemetation can be found at ML models directory.
This code uses DecisionTreeClassifier instead of LogisticRegression. The structure is similar: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.
This code uses LinearSVC instead of LogisticRegression or DecisionTreeClassifier. The structure remains similar: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.
Adjust the k_neighbors parameter based on your requirements. The structure is similar to the previous examples: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.
This code uses RandomForestClassifier from scikit-learn. The structure is similar to the previous examples: extract the first column for binary classification, split the dataset, flatten the images, initialize the model, train the model, make predictions, and evaluate the accuracy.
Once you have the repository set up and the dependencies installed, you can start utilizing the project:
Contributions are welcome and encouraged! If you'd like to contribute to Hm.JetscapeMl, follow these steps:
Please ensure your contributions adhere to the project's coding standards and follow best practices.
This project is licensed under the MIT License. Feel free to customize this template according to your project's specifics and additional information you want to provide.