kyleskom / NBA-Machine-Learning-Sports-Betting

NBA sports betting using machine learning
1.19k stars 435 forks source link

Clarification: Question about training outputs and data structure #378

Closed CaptainJeff closed 5 months ago

CaptainJeff commented 10 months ago

I have two basic questions I was hoping to get some insight on

1) When you run python -m XGBoost_Model_ML to train the model it outputs ~10 different models. It seems like it does this on every iteration that the accuracy score is greater than the max score of previous iterations during the current training session

    if acc == max(acc_results):
        model.save_model('../../Models/XGBoost_{}%_ML-4.json'.format(acc))

But we just want to use the model with the highest score in our XGBoost_Runner.py right?

2) I think I know the answer to this but can you confirm that the data param being passed to XGBoost_Runner.py

https://github.com/CaptainJeff/NBA-Machine-Learning-Sports-Betting/blob/c866111380b16cb857d240661cc1e6dfd333ef83/src/Predict/XGBoost_Runner.py#L20

Has to be the same structure as the data array being trained on in XGBoost_Model_ML.py where the columns are in the same order, etc...

https://github.com/CaptainJeff/NBA-Machine-Learning-Sports-Betting/blob/c866111380b16cb857d240661cc1e6dfd333ef83/src/Train-Models/XGBoost_Model_ML.py#L21

Thanks so much! Really enjoying this project

CaptainJeff commented 10 months ago

Followup: I just submitted a PR for point 1 above https://github.com/kyleskom/NBA-Machine-Learning-Sports-Betting/pull/379