kazooki117 / mtg-data-analysis

GNU General Public License v3.0
2 stars 0 forks source link

Data structure #1

Open kazooki117 opened 5 years ago

kazooki117 commented 5 years ago

Context

We want to create a consistent DB for all draft MTG data gathered around. For that we will setup a database hosted in the cloud that everyone can access to with the proper ids.

Objective

As a first step we need to determine what will be the structure of our schema. The purpose of this ticket is to discuss the structure (tables and relations between them) of this database.

ValentinPerret commented 5 years ago

I propose to consolidate all the datasources in a Postgres DB hosted on AWS. We need to think about the schema structure. (We can use a tool like SQLdbm to visualize data structure and link between tables.)

My first pass would be the following tables in the schema for deck storage:

For draft data I would say:

All of the table can be linked on their id parameters.

OvidiuCalburean commented 5 years ago
nebanche commented 5 years ago

Hey all, Lemon_Tea here from the discord!

What does score on deck represent? 0 for L, 1 for Win? It might just be better to have win, loss, tie as the values instead of trying to aggregate values and have the meaning of it hidden.

Would it help to have WURBG value on the deck data as well?

I'm also confused here by DraftLog, is this a table for the entire draft or just a given pick?

If this is a given pick, it should be represented by a list of cards(given by card id) and then the pick_id of the card that is picked. With the current schema it looks to be a sort of N+1 problem where we are going to need to be storing a lot more data on a given record then what you would need.

sjb9774 commented 5 years ago

I think it's important to structure the data based on what the typical queries on it may look like. What useful questions would we want to be able to ask of the data? I would imagine questions like this:

I say this just so there is due respect given to how granular one would need to be in order to produce a truly useful database. I've tried doing this myself before (and actually got somewhat far, see here: https://github.com/sjb9774/MTGDB-API), ultimately I abandoned it as it was overly ambitious.

Instead of importing all the card data from MTGJSON as suggested by @ValentinPerret , I might suggest just using the multiverse id (or, if there's a better unique identifier then use that) and expect users of the database to use the Scryfall API (or some other) to pull detailed card info in their own applications and leave this database to just be a repository or well-formed draft logs. In that way a lot of the processing is offloaded to the individual users (which is unfortunate) but I think ultimately smarter.

rconroy293 commented 5 years ago

Just added a branch that has a basic data structure and imports MTGO draft logs here: https://github.com/kazooki117/mtg-data-analysis/tree/importer/importer. Definitely some room for improvement in the data structure, e.g.

This also currently has no support for recording deck performance, nor for how to incorporate sealed decks.

ValentinPerret commented 5 years ago

@nebanche:

@sjb9774:

rconroy293 commented 5 years ago

https://github.com/kazooki117/mtg-data-analysis/pull/14 updates the schema as follows:

expansions:

cards:

users:

drafts:

packs:

pack_cards:

picks:

Querying all the cards for a draft looks like:

SELECT
    packs.pick_number,
    GROUP_CONCAT(cards.name ORDER BY cards.face SEPARATOR '/') pick
FROM packs
JOIN pack_cards ON (packs.id = pack_cards.pack)
JOIN cards ON (pack_cards.card_multiverse_id = cards.multiverse_id)
JOIN picks ON (picks.pack_card = pack_cards.id)
WHERE packs.draft = 2
GROUP BY packs.pick_number
ORDER BY packs.pick_number

This leaves only the decks and records left that need schemas.