odota / core

Open source Dota 2 data platform
https://www.opendota.com
MIT License
1.51k stars 303 forks source link

Handle null item times and adjust cost thresholds in `getHeroItemPopularity` #2664

Closed ff137 closed 1 year ago

ff137 commented 1 year ago

getHeroItemPopularity can error if item.time is null. #2663

Also adjust cost thresholds used in defining early, mid, and late game items. See review comments

ff137 commented 1 year ago

Here is the existing logic:

    const startGameItems = countItemPopularity(
      items.filter((item) => item.time <= 0)
    );
    const earlyGameItems = countItemPopularity(
      items.filter(
        (item) => item.time > 0 && item.time < 60 * 10 && item.cost > 700
      )
    );
    const midGameItems = countItemPopularity(
      items.filter(
        (item) =>
          item.time >= 60 * 10 && item.time < 60 * 25 && item.cost > 2000
      )
    );
    const lateGameItems = countItemPopularity(
      items.filter((item) => item.time >= 60 * 25 && item.cost > 4000)
    );

My first proposition is that start game items should be redefined:

    const startGameItems = countItemPopularity(
      items.filter((item) => item.time <= 0 && item.cost <= 600)
    );

because, apparently null <= 0 will return true in js. So, in case a null time is recorded for a lategame item, we can add item.cost <= 600 to filter any of those potential edge cases. 600 is the starting gold.

My other proposition is to adjust the item.cost filters used in mid and late game items. Because, 2000 and 4000 feel like really high thresholds, so will discard a lot of other items, especially for supports. My recommendation would be:

    const midGameItems = countItemPopularity(
      items.filter(
        (item) =>
          item.time >= 60 * 10 && item.time < 60 * 25 && item.cost > 1200
      )
    );
    const lateGameItems = countItemPopularity(
      items.filter((item) => item.time >= 60 * 25 && item.cost > 1800)
    );

@howardchung what do you think?

howardchung commented 1 year ago

I feel like if we want to check for null, we should explicitly do that check rather than relying on the 600 gold filter (since that could change)

ff137 commented 1 year ago

So, I added a filter for null keys, just in case.

I'm trying to inspect cases where time is null, playing around with the /explorer. I finally got GPT to help write a query that runs, but it just times out, even with select 1.

It's basically like this:

SELECT purchase_log
FROM player_matches
WHERE purchase_log::text LIKE '%"time":null%'
LIMIT 1

which times out: https://api.opendota.com/api/explorer?sql=SELECT%20purchase_log%20FROM%20player_matches%20WHERE%20purchase_log::text%20LIKE%20%27%25%22time%22:null%25%27%20LIMIT%201;

Anyway, sharing that in case you can query without timeout and see an example of a problematic row

howardchung commented 1 year ago

The LIKE query is going to be slow because there isn't a text index on that column.

ff137 commented 1 year ago

@howardchung thoughts on these changes?

howardchung commented 1 year ago

My only blocking concern is changing the filter condition to: .filter((item) => item && item.key)

ff137 commented 1 year ago

My only blocking concern is changing the filter condition to: .filter((item) => item && item.key)

How do you mean? I've added: .filter((item) => item !== null && item.key !== null) Do you want to add item.time !== null?

howardchung commented 1 year ago

That depends on the behavior we want if time is null/undefined. If we want to default it to 0, then we shouldn't add it.

But the change I'm asking for is either to just check for item && item.key or item != null && item.key != null to handle undefined (which may not be possible, but I figure it's safer)

ff137 commented 1 year ago

@howardchung thanks 👍