Open mghasemi19 opened 1 year ago
We can probably deal with this by including less signal events rather than considering all the signals. In the first try, I am going to include 50%-50% (signal-background) events for ML training and see how the model's outputs look like.
Cheking with LG, he suggested to generate more samples and check overfitting. I will try to check overfitting for both RF and NN optimized models and see if it affects the final model's accuracy.
For RF and NN models, hyperparameters are tuned and overfitting is checked. Optimized NN model shows better performance in terms of accuracy and generalization to the whole dataset.
Considering the fraction of signals versus total backgrounds, there is a high skewed data which ends up very low accuracy and F1 score for all ML models. Is there any way to fix this like: including only the main backgrounds, generating more signals, or adding more discriminating variables?