Version 3 of dynamically evaluating 1st round picks in trades

sumitde22 commented 1 year ago

Sorry, had to make a new PR because I messed up the local copy of my other branch to the point where the trade finding algo wasn't working anymore, figured it was easier to cut a branch from a previous working commit and make my changes there

Testing Conducted: For reference, here are the starting team ratings: https://imgur.com/5zubF50

On the master branch, teams are willing to give up 1st round picks even when their team OVR tanks to < 40: https://imgur.com/abL2ZIZ

One team is willing to do this trade that tanks their team OVR to -36 while also giving up 2 immediate 1sts. Not to mention also give up a ton of value in players: https://imgur.com/V8ZbWJj

On feature branch: teams are no longer willing to trade their 1sts AND tank their OVR: https://imgur.com/o0igvhC. The lowest ovr a team is willing to part with 1st round picks for is 57.

Now team from before is only giving up 2nds when giving up a ton of players: https://imgur.com/IlwQRFp And giving up far fewer players when giving up 1sts: https://imgur.com/eTjJwZY

Time spent on finding AI-to-AI trades for Basketball GM, standard AI-to-AI trade rate, from beginning of season to trade deadline:

Master branch: 1.75 secs This feature branch: 2.52 secs

I did other benchmarking/testing/analysis as well, but didn't want to overload this description, so let me know what you would be interested in knowing and I can give you the info

sumitde22 commented 1 year ago

There seems to be a bug that has come up in calculating ovr for baseball, autoplay stops if you sim a few games. It comes up when calling the summary function, might be connected to the changes that were made to speed things up in ovr.ts/ovrByPosFactory.ts?

dumbmatter commented 1 year ago

Thank you so much for your continued work on this!

I think my main concern with baking in a regression model is that the model may lose validity not just for other sports, but based on different settings or roster files used.

But also, is this actually better than the previous PR branch #435 in terms of either performance or functionality? Adding more complexity also would require a better idea if we're actually gaining much from the complexity. Same reason I think we should test if the binary search actually helped.

sumitde22 commented 1 year ago

So a couple thoughts

I think its better than the previous PR branch from when I tested it, on the other branch simming the AI-to-AI trades was taking about 4-5 seconds per season, for this branch it's closer to 2.5. Breaking it down in terms of profiling calls to team.ovr:

The master version makes about 1500 calls to team.ovr from the beginning of the season to the trade deadline (30 calls for each day that gets simulated to try an AI-to-AI trades). The previous branch adds about 2500 calls to this 1500 when trying to dynamically compute OVR when building trades for a total of ~4000. This branch removes the need to make those 1500 calls by replacing them with approximations, and instead only uses calls to team.ovr for dynamically computing ovr when estimating picks.

It makes sense to me to make this "move". For example, if you can tell a team is about the 5th ranked team if they have an ovr of say about 65, should you have to compute the ovrs of the team ranked 15-30 and sort them just to figure this out? Seems to me like something that the AI should be able to approximate without that level of precision. Not to mention the ovrs of all 30 teams is being computed every time an AI-to-AI trade is simulated, almost every day based on my understanding. The way I see it, the precision is being shifted from getting the exact "rank" of a team based on ovr, to making sure it doesn't get fleeced by undervaluing its pick.

There are two regression models here, one for mapping team ovr -> estimated rank in league, another for mapping win % to estimated draft pick.

The team ovr to estimated rank model would be dynamically computed at the beginning of every season for the specific league that user is running (it's pretty fast), so I think that would make it adaptable to different settings or roster files. It's a simple linear regression as of now. Like once you get one snapshot of team ratings for a season, I think it should be pretty accurate for the rest of that season (It would seem to me that the AI should be able to tell that a ~60 ovr team is about 10th best and a 45 ovr team is about 20th best without recomputing all team ovrs each day of the season)

And the mapping from win% to estimated draft pick I think is pretty agnostic of league settings/rating distribution, as the distribution of wins per team tends to settle into something looking like a normal distribution. Like for basketball, a team projected to win 28 out of 82 would get about the 5th pick, a team winning 40 would get about the 15th pick, and a team winning 51 would get about the 25th pick regardless of team ratings. I did some simulations and the distribution of win %s seemed to converge into an S-shaped distribution. I'll push the Jupyter notebook I was tinkering with to this PR to show you some of the analysis

dumbmatter commented 1 year ago

I ran the same benchmarks I did on the previous branch for this branch too. Specifically, I'm using these 3 commented lines to sim a bunch of AI trades https://github.com/zengm-games/zengm/pull/435/files#diff-fd538973ab81a6eecd994c171c4d767b3dcc1ebc9857f05e9428dc583f2fbe85 but with different multipliers (sample sizes), different sports... it seems it's hard to get consistent results, I am not currently sure which of the 3 options (this PR, previous PR, master) is fastest. That's just for basketball though. For other sports, master branch is significantly faster.

What you say about reducing calls to team.ovr does make sense, less function calls is better. But maybe that's not the biggest bottleneck?

I'm not sure what makes most sense going forward... I'm thinking of maybe just taking the performance improvement parts and leaving the AI improvement parts for later.

sumitde22 commented 1 year ago

Ah I was focusing on calls to team.ovr because I think you mentioned that could be slowing things down for leagues with bigger teams like baseball, maybe that's not the bottleneck.

I just pushed one last change that caches more heavily, I think its a similar speed to the master branch. If its still significantly slower on your end, you can do what you said and take only the performance improvement parts for now

dumbmatter commented 1 year ago

I probably sound crazy at this point... but I have been doing some more benchmarking and I'm not sure that what I wrote before about reformatting player objects being a bottleneck.

But I did find something that is very real! https://github.com/zengm-games/zengm/blob/f7e934b412f9dc06c1b44b6631a508e705317ed0/src/worker/core/team/ovrByPosFactory.ts#L53-L54 these function calls are kind of expensive, and they're happening once per player. But they only need to happen once, not once per player.

Unfortunately this only affects baseball, since that code only runs for baseball. But it is significant! Simming a season of baseball is like 20% faster, consistent over many tests. So that's at least one good thing to come out of all of this so far, who knows how long it would have taken me to notice otherwise!

zengm-games / zengm

Version 3 of dynamically evaluating 1st round picks in trades #437