katerberg / 9th-seed

Twitch Bot and API for Vintage Rotisserie Draft
GNU Affero General Public License v3.0
1 stars 0 forks source link

Allow querying cards by "likely companions" #43

Closed katerberg closed 2 years ago

katerberg commented 3 years ago

Enter a card and see the cards most commonly drafted with it (taking into account ratio)

katerberg commented 2 years ago

Luca has provided some python code that should make this a good start:

import math as Math

# dictionary format is {card:date, ...}, where the cards can be either integer representation or strings, and date has to be in a integer date format
legal_dict = {0:10, 1:10, 2:10, 3:10, 4:10, 5:10, 6:10, 7:10, 8:10, 9:10, 10:10, 11:10, 12:10, 13:10, 14:10, 15:10, 16:20, 17:20, 18:20, 19:20, 20:20}

# list format is [[date, pick1, pick2, ...], ...] where picks is in the same format as the card field in legal_dict
draft_list = [
    [15,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15],
    [17,  0,  3,  2,  1,  4,  7,  9, 14,  5,  6,  8, 10, 11, 12, 13, 15],
    [25,  0,  1,  3,  2, 16,  4,  7,  5,  6,  8, 17, 18, 19, 20, 14, 15]
    ]

deck_list = []
pick_dict = {}
pickable_dict = {}
comorbidity_dict = {}
weighted_dict = {}

def is_legal(cardnumber, date):
    legal_date = legal_dict[cardnumber]
    return (date >= legal_date)

#0) populate legal_dict with cards as keys whose entries is the card's legality date
#TODO

#1) populate draft_list with lists containing date followed by picks in a draft
#TODO

#2) populate deck_list with lists containing date followed by picks, but for each deck instead of for each draft
for draft in draft_list:
    decks = [
        draft[0:1], #insert first item, date, as first item in the decks
        draft[0:1],
        draft[0:1],
        draft[0:1],
        draft[0:1],
        draft[0:1],
        draft[0:1],
        draft[0:1]
        ]
    for picknumber, pick in enumerate(draft[1:]): # use [1:] in order to not include date
        player = (picknumber % 16) if (picknumber % 16) < 8 else 15-(picknumber % 16) #0 to 7, then 7 back to 0, over and over again
        decks[player].append(pick)
    deck_list.extend(decks) #add calculated decks to list of all decks

#3) populate pick_dict with cards as keys whose entries are dictionaries, with cards picked alongside the key card as keys whose entries are number of times picked
for deck in deck_list:
    for card in deck[1:]: # use [1:] in order to not include date
        if card in pick_dict:
            for other_card in deck[1:]:
                if other_card == card:
                    continue
                elif other_card in pick_dict[card]:
                    pick_dict[card][other_card] += 1
                else:
                    pick_dict[card][other_card] = 1
                    pickable_dict[card][other_card] = 0 #setup for step 4)
                    comorbidity_dict[card][other_card] = 0.0 #setup for step 5)
                    weighted_dict[card][other_card] = 0.0 #setup for step 6)
        else:
            pick_dict[card] = {}
            pickable_dict[card] = {} #setup for step 4)
            comorbidity_dict[card] = {} #setup for step 5)
            weighted_dict[card] = {} #setup for step 6)

            for other_card in deck[1:]:
                if other_card == card:
                    continue
                else:
                    pick_dict[card][other_card] = 1
                    pickable_dict[card][other_card] = 0 #setup for step 4)
                    comorbidity_dict[card][other_card] = 0.0 #setup for step 5)
                    weighted_dict[card][other_card] = 0.0 #setup for step 6)

#4) populate pickable_dict with cards as keys whose entries are dictionaries, with cards picked alongside the key card as keys whose entries are number of times pickable
for card in pickable_dict:
    for deck in deck_list:
        if card in deck[1:]: # use [1:] in order to not include date
            for other_card in pickable_dict[card]:
                if is_legal(other_card, deck[0]): # check if card was legal at the date
                    pickable_dict[card][other_card] += 1

#5) populate comorbidity_dict with pick_dict / pickable_dict
#6) populate weighted_dict with Lower Bound Wilson Score
z = 1.96
for card in comorbidity_dict:
    for other_card in comorbidity_dict[card]:
        comorbidity_dict[card][other_card] = phat = float( pick_dict[card][other_card] ) / pickable_dict[card][other_card]
        n = pickable_dict[card][other_card]

        # OBS! Weighted value not for displaying as a number, only use for determining order for which to display cards
        weighted_dict[card][other_card] = ( phat + z*z/(2*n) - z*Math.sqrt( (phat*(1-phat) + z*z/(4*n))/n) ) / (1+z*z/n)

#?) Print data, #TODO: Replace this with write to file or database
print("Picks")
print(pick_dict)
print("Pickables")
print(pickable_dict)
print("Comorbidities")
print(comorbidity_dict)
print("Weights")
print(weighted_dict)
print("--------")
katerberg commented 2 years ago

Thanks @luca0483

luca0483 commented 2 years ago

Since writing this code, i've become familiar with how to extract data from scryfall's raw data json. I could very quickly turn this proof-of-concept code into the real deal if you gave me the syntax of whatever csv or json you have that contains all the drafts.

katerberg commented 2 years ago

The CSVs are here: https://github.com/katerberg/9th-seed/tree/main/drafts, but they get turned into SQL, so this likely need to just be SQL queries against the schemas in this folder: https://github.com/katerberg/9th-seed/tree/main/sql

katerberg commented 2 years ago

Progress state:

SELECT card, picksWith, averagePick, picksOfSelection, picksOfSynergyOption, numberAvailable, LEAST(picksOfSelection, numberAvailable) as numberOfOpportunities, picksWith/LEAST(picksOfSelection, numberAvailable) as percentageTogether
FROM (
    SELECT groupedPicks.card, picksWith, averagePick, picksOfSelection, picksOfSynergyOption, ( 
        SELECT count(*) 
        FROM drafts 
        WHERE oracle.releaseDate 
        BETWEEN "1000-01-01" AND drafts.occurance 
    ) AS numberAvailable 
    FROM (
        SELECT b.card, count(b.card) as picksWith, (SELECT count(card) FROM (archives) WHERE archives.card = 'energy field') as picksOfSelection, (SELECT count(card) FROM (archives) WHERE archives.card = b.card) as picksOfSynergyOption, avg(b.pick) as averagePick FROM archives a
        INNER JOIN archives b ON a.player = b.player AND a.draft = b.draft
        WHERE a.card = 'energy field'
        GROUP BY b.card
        ORDER BY count(b.card) DESC, averagePick ASC
    ) groupedPicks
    INNER JOIN oracle on oracle.card = groupedPicks.card
    WHERE groupedPicks.picksWith > 1
) AS availabledPicks
ORDER BY percentageTogether DESC;

I'm still trying to figure out ideal logic:

  1. Ideally cards that are only drafted together should be a the top e.g. Thespian's Stage and Dark Depths
  2. Cards that are just good but not really related to the card should not be suggestions with niche strategies e.g. Earthcraft shouldn't suggest Tarmogoyf over Squirrel's Nest even though it is one of the cards that has always been taken with it due to low cases of Earthcraft being taken
  3. New cards shouldn't get punished e.g. svyelun of sea and sky should be suggested over Ancestral Recall for Master of the Pearl Trident even though it's taken less often technically.
katerberg commented 2 years ago

This is looking like the final answer:

SELECT card, picksWith, averagePick, picksOfSelection, picksOfSynergyOption, LEAST(availablePicksOfSelection, numberAvailable) as overlapInstances, picksWith/LEAST(picksOfSelection, numberAvailable) as percentageTogether
FROM (
    SELECT groupedPicks.card, picksWith, averagePick, picksOfSelection, picksOfSynergyOption, availablePicksOfSelection, ( 
        SELECT count(*) 
        FROM drafts 
        WHERE oracle.releaseDate 
        BETWEEN "1000-01-01" AND drafts.occurance 
    ) AS numberAvailable 
    FROM (
        SELECT b.card, count(b.card) as picksWith, (SELECT count(*) 
        FROM drafts 
        INNER JOIN archives on archives.draft = drafts.draft
        INNER JOIN oracle on oracle.card = archives.card
        WHERE archives.card like 'Master of the Pearl Trident%'
        AND oracle.releaseDate BETWEEN "1000-01-01" AND drafts.occurance
        ) as availablePicksOfSelection, (SELECT count(card) FROM (archives) WHERE archives.card like 'Master of the Pearl Trident%') as picksOfSelection, (SELECT count(card) FROM (archives) WHERE archives.card = b.card) as picksOfSynergyOption, avg(b.pick) as averagePick FROM archives a
        INNER JOIN archives b ON a.player = b.player AND a.draft = b.draft
        WHERE a.card like 'Master of the Pearl Trident%'
        GROUP BY b.card
        ORDER BY count(b.card) DESC, averagePick ASC
    ) groupedPicks
    INNER JOIN oracle on oracle.card = groupedPicks.card
    WHERE groupedPicks.picksWith > 1
) AS availabledPicks
ORDER BY percentageTogether DESC, picksWith DESC, picksOfSynergyOption ASC, averagePick ASC;
katerberg commented 2 years ago

Fixed by https://github.com/katerberg/9th-seed/commit/be4dc62ae09857cf1ebdd4757bbc04507a7baa14