Skewed data with low ML accuracy

mghasemi19 / TopFC

Top quark flavor changing to up and charm quarks analysis.

1 stars 0 forks source link

Skewed data with low ML accuracy #3

Open mghasemi19 opened 1 year ago

mghasemi19 commented 1 year ago

Considering the fraction of signals versus total backgrounds, there is a high skewed data which ends up very low accuracy and F1 score for all ML models. Is there any way to fix this like: including only the main backgrounds, generating more signals, or adding more discriminating variables?

mghasemi19 commented 1 year ago

We can probably deal with this by including less signal events rather than considering all the signals. In the first try, I am going to include 50%-50% (signal-background) events for ML training and see how the model's outputs look like.

mghasemi19 commented 1 year ago

Cheking with LG, he suggested to generate more samples and check overfitting. I will try to check overfitting for both RF and NN optimized models and see if it affects the final model's accuracy.

mghasemi19 commented 1 year ago

For RF and NN models, hyperparameters are tuned and overfitting is checked. Optimized NN model shows better performance in terms of accuracy and generalization to the whole dataset.