michaelparker7 / FIN-377-Final-Project

0 stars 0 forks source link

Feedback on proposal #1

Open donbowen opened 5 months ago

donbowen commented 5 months ago

@elvinlee12 @michaelparker7 @Brandon4106

Cool idea! Pretty solid write up but let's focus the revision on clarifying the plan. There is lots for you to think about. For example, you say "Observations: Each of the 30 NBA teams and each of their respective props" ... this implies a dataset with 30 rows. That's not what you mean!

The above is enough for a project. Examining player props is another thing entirely, so avoid it for now. However, the market for game over/under and totals is very competitive and close to efficient. The player prop market likely has more profit opportunities. You'd want to model things like expected minutes played and other things to feed into a player prop model. The main issue for getting rich on player props is that you can't bet much into these markets.

donbowen commented 5 months ago

It's fine to manually download data. It's not the preferred, but if it's all that's viable, fine.

I'm still unclear about your plan. I think, but I have to read between the lines and guess, that you'll model something like HowMuchWillTheHomeTeamWinBy as the y, and X is all these stats about the team and the same stats for the opponent and maybe the difference in those stats.

You'll need stats for all 30 teams, even though you have gambling info only on 4 teams.

Run Machine learning - X_train is the information given before the game (team stats, team strength, player stats, injuries), and y_train is the results of the games

Ok, good. But not enough details about this process. Your splitting method should probably be: drop last 3 weeks of games (lots of resting and injuries and tanking and shinanigans), and the holdout is the prior month. The months before are training.

CV method will need to be NOT kfold, you need to use a time series style of splitting.

Scoring... you don't explain this, not rigorously. Maybe you have a gambling rule like, if HowMuchWillTheHomeTeamWinBy > line, bet on the home team, else bet road. Then it's as simple as computing the returns on the chosen bet at the available odds.