Open kazooki117 opened 5 years ago
I propose to consolidate all the datasources
in a Postgres DB hosted on AWS. We need to think about the schema structure. (We can use a tool like SQLdbm to visualize data structure and link between tables.)
My first pass would be the following tables in the schema for deck storage:
cards
: card_id, [card_infos] (imported from JSONMTG)deck
: deck_id, session_id [optional, if draft log available], score_game_1 [int], score_game_2 [int], score_game_3 [int], score_total [calculated from the score game 1,2 and 3]deck_details
: deck_details_id, deck_id , card_id , card_num [int]For draft data I would say:
session
: session_id, session_date [timestamp]draft_log
: draft_id,session_id , pack_num [int], pick_num. [int], is_picked [bool] (we would store the list of all the cards seen here and put true on the is_picked column for the card we take)All of the table can be linked on their id parameters.
Hey all, Lemon_Tea here from the discord!
What does score on deck represent? 0 for L, 1 for Win? It might just be better to have win, loss, tie as the values instead of trying to aggregate values and have the meaning of it hidden.
Would it help to have WURBG value on the deck data as well?
I'm also confused here by DraftLog, is this a table for the entire draft or just a given pick?
If this is a given pick, it should be represented by a list of cards(given by card id) and then the pick_id of the card that is picked. With the current schema it looks to be a sort of N+1 problem where we are going to need to be storing a lot more data on a given record then what you would need.
I think it's important to structure the data based on what the typical queries on it may look like. What useful questions would we want to be able to ask of the data? I would imagine questions like this:
I say this just so there is due respect given to how granular one would need to be in order to produce a truly useful database. I've tried doing this myself before (and actually got somewhat far, see here: https://github.com/sjb9774/MTGDB-API), ultimately I abandoned it as it was overly ambitious.
Instead of importing all the card data from MTGJSON as suggested by @ValentinPerret , I might suggest just using the multiverse id (or, if there's a better unique identifier then use that) and expect users of the database to use the Scryfall API (or some other) to pull detailed card info in their own applications and leave this database to just be a repository or well-formed draft logs. In that way a lot of the processing is offloaded to the individual users (which is unfortunate) but I think ultimately smarter.
Just added a branch that has a basic data structure and imports MTGO draft logs here: https://github.com/kazooki117/mtg-data-analysis/tree/importer/importer. Definitely some room for improvement in the data structure, e.g.
This also currently has no support for recording deck performance, nor for how to incorporate sealed decks.
@nebanche:
draft_log
would store the entire draft. You have a line per card. So pack 1 pick one would have 15 cards. Then P1P2 14 cards etc. Using this structure allows us to store all the information about the draft (and will allow to do analysis on pick orders and pick of cards over other cards in the future).2
, (2-1) --> 1
, (1-1) --> 0
, (1-2) --> -1
, (0-2) --> -2
(issue: I miss 1-0 and 0-1... :s)WURBG
value is redundant with the list of cards. I feel like we could add this calculation in a view
(see comment below) and not in the base table.@sjb9774:
postgres views
on top of the base tables that would present the data to be queried easily for specific use cases as you described. https://github.com/kazooki117/mtg-data-analysis/pull/14 updates the schema as follows:
expansions
:
abbreviation
: String(3), nullable=False, primary_key=True
name
: String(255), nullable=False
max_booster_number
: Integer, nullable=False
cards
:
id
: Integer, primary_key=True
multiverse_id
: Integer, nullable=False
expansion
: String(3), ForeignKey('expansions.abbreviation'), nullable=False
name
: String(255), nullable=False
rarity
: String(255), nullable=False
number
: Integer, nullable=False
face
: String(255)
mana_cost
: String(255)
text
: String(1023)
type
: String(255), nullable=False
power
: Integer
toughness
: Integer
loyalty
: Integer
users
:
id
: Integer, primary_key=True
username
: String(255), nullable=False
drafts
:
id
: Integer, primary_key=True
user
: Integer, ForeignKey('users.id'), nullable=False
name
: String(255), nullable=False
start_time
: DateTime, nullable=True
packs
:
id
: Integer, primary_key=True
draft
: Integer, ForeignKey('drafts.id'), nullable=False
pick_number
: Integer, nullable=False
pack_cards
:
id
: Integer, primary_key=True
pack
: Integer, ForeignKey('packs.id'), nullable=False
card_multiverse_id
: Integer, nullable=False
picks
:
id
: Integer, primary_key=True
pack_card
: Integer, ForeignKey('pack_cards.id'), nullable=False
Querying all the cards for a draft looks like:
SELECT
packs.pick_number,
GROUP_CONCAT(cards.name ORDER BY cards.face SEPARATOR '/') pick
FROM packs
JOIN pack_cards ON (packs.id = pack_cards.pack)
JOIN cards ON (pack_cards.card_multiverse_id = cards.multiverse_id)
JOIN picks ON (picks.pack_card = pack_cards.id)
WHERE packs.draft = 2
GROUP BY packs.pick_number
ORDER BY packs.pick_number
This leaves only the decks and records left that need schemas.
Context
We want to create a consistent DB for all draft MTG data gathered around. For that we will setup a database hosted in the cloud that everyone can access to with the proper ids.
Objective
As a first step we need to determine what will be the structure of our schema. The purpose of this ticket is to discuss the structure (tables and relations between them) of this database.