vanderbilt-data-science / pga-tour-advanced-analytics

PGA Tour In Context Learning Predictive Model
0 stars 0 forks source link

Initialize quantitative in context learning approach with features from finalized random forest model #6

Closed zprintz closed 6 months ago

zprintz commented 8 months ago

Prompt v1: The following is performance data for golfers. Use the below training data to predict the standings at the ___ tournament. Report the predictions by name and predicted finishing position. Problems:

  1. GPT defaulted to a multiple linear regression model, each time new data was inputted, GPT would just add to the existing regression model.
  2. GPT predicted results for players not in the tournament. We need to specify the field list for the given tournament.
zprintz commented 8 months ago

Prompt v2: The following is performance data for golfers. Use the below training data to predict the standings at the Charles Schwab Challenge, which takes place on 5/23/23. Report the predictions by name and predicted finishing position. Do not use code interpreter to solve the problem. Here is a list of the players participating in the tournament:

Problems:

  1. Only the top 10 finishers were returned, not the entire field.
  2. Need to be more specific with regards to the use of code interpreter. We must say it is okay when calculating performance metrics such as NDCG after each tournament; however, we must be clear not to use it in the actual predictions.
  3. Must ask GPT to produce the output in a comma separated list, formatted as Player, Position to make the NDCG calculation easier and avoid time out errors.
zprintz commented 8 months ago

Prompt v3: The following is performance data for golfers. Use the below training data to predict the standings at the Charles Schwab Challenge, which takes place on 5/23/23. Report the predictions by name and predicted finishing position in the following format: player, predicted position; player predicted position, etc. Do not use code interpreter to solve the problem. Provide predictions for all 72 players in the specified field below. Here is a list of the players participating in the tournament:

Second prompt (for NDCG calculation): Great, here are the actual results. Now use code interpreter to calculate the NDCG for this tournament based on your predictions and the actual outcome below:

Problems:

  1. Still not generating the most efficient list. We need to specify removing st, nd, and rd in the numbers (1, 2, 3 instead of 1st, 2nd, 3rd) to avoid time out errors in NDCG calculation.
  2. Ensure GPT is producing an output formatted in player: position, player: position, player: position, etc. instead of the 1-72 list we are currently getting.
zprintz commented 8 months ago

Prompt v4: The following is performance data for golfers. Use the below training data to predict the standings at the Charles Schwab Challenge, which takes place on 5/23/23. Report the predictions in the following format instead of a list output: player: predicted position, player: predicted position, etc. Do not use code interpreter to solve the problem. Provide predictions for all 72 players in the specified field below. Here is a list of the players participating in the tournament:

Second prompt (for NDCG calculation): Great, here are the actual results. Now use code interpreter to calculate the NDCG for this tournament based on your predictions and the actual outcome below:

Problems:

  1. GPT is not referencing its own predictions in the NDCG calculation. It seems to have no memory of its predicted list even when prompted specifically to use it in a follow up.
  2. I eventually got an NDCG by reminding GPT of its predictions and the actual ones. The NDCG for the tournament was 0.15.
  3. Going forward, I think it would be better to have GPT predict a tournament, store the predictions externally in a dictionary, provide the actual results of the tournament and instruct GPT to predict the next tournament with that performance in mind.
zprintz commented 8 months ago

Prompt v5: The following is performance data for golfers. Use the below training data to predict the standings at the Charles Schwab Challenge, which takes place on 5/23/23. I will be pasting this output into Python, so please report the predictions as a dictionary with the player name as the key and your predicted position as the value. {"Player 1": 1, "Player 2": 2, etc.}. Do not use code interpreter for the actual predictions themselves, but please use it to format your predictions in the specified format. Provide predictions for all 72 players in the specified field below. Here is a list of the players participating in the tournament:

Second prompt: Here are the actual results of the tournament. Please use the results to refine your prediction capabilities. The next tournament is the Memorial Tournament presented by Workday, which takes place on 6/4/23. Again, do not use code interpreter for the actual predictions themselves, but please use it to format your predictions in the specified format. Provide predictions for all the players in the specified field below.

Problems:

  1. Predictions for the second tournament are not exact, instead ranges are provided (likely top 50, could finish in top 35, etc.)
  2. Predictions are given in word form instead of the specified output for the second tournament

Going forward, I will proceed with the first prompt as is, but I will reword the second prompt to get more specified outputs.

zprintz commented 8 months ago

Prompt v6: The following is performance data for golfers. Use the below training data to predict the standings at the Charles Schwab Challenge, which takes place on 5/23/23. I will be pasting this output into Python, so please report the predictions as a dictionary with the player name as the key and your predicted position as the value. {"Player 1": 1, "Player 2": 2, etc.}. Do not use code interpreter for the actual predictions themselves, but please use it to format your predictions in the specified format. Provide predictions for all 72 players in the specified field below. Here is a list of the players participating in the tournament:

Second prompt: Here are the actual results of the tournament. Please use the results to refine your prediction capabilities. The next tournament is the Memorial Tournament presented by Workday, which takes place on 6/4/23. As with the previous tournament, I will be pasting this output into Python, so please report the predictions as a dictionary with the player name as the key and your predicted position as the value. {"Player 1": 1, "Player 2": 2, etc.}. Do not use code interpreter for the actual predictions themselves, but please use it to format your predictions in the specified format. Provide exact predictions for all players in the specified field below.

This set of prompts seems to work well. I have compiled predictions for the first three tournaments (second and third prompts are identical). I have stored the dictionaries produced in a separate jupyter notebook file and have the following results for the NDCG scores of each tournament individually as well as cumulatively. Note these are without the two features we created. I am rerunning the in context learning predictions with the two new features included and will add them in the next comment: NDCG Charles Schwab Challenge: 0.589 NDCG Memorial Tournament: 0.547 NDCG RBC Canadian Open: 0.578 Cumulative NDCG Score: 0.571

zprintz commented 7 months ago

Prompt v6: I used the same prompt as above, but this time included our two new features in the data. Below are the results. They represent a slight improvement relative to the same prompt without the field strength or recent form features. NDCG Charles Schwab Challenge: 0.542 NDCG Memorial Tournament: 0.648 NDCG RBC Canadian Open: 0.599 Cumulative NDCG Score: 0.596 NDCG of Pure Chance: 0.606