rippinrobr / hockey-databank

The end of the season files that are posted on the Hockey Databank Yahoo Group
10 stars 1 forks source link

masters.csv does not agree with master table in sql #7

Open mister-elliott opened 5 years ago

mister-elliott commented 5 years ago

as an example: master csv for sid abel: abelsi01,abelsi01c,abelsi01h,Sid,Abel,,Sidney Gerald,Boot Nose,71,170,L,P196901,5931,abelsi01,1938,1953,,,C,1918,2,22,Canada,SK,Melville,2000,2,7,USA,MI,Farmington

master sql: lastName is an integer, not TEXT, and lastName in the insert value statement is an int, not a name. eg. CREATE TABLE IF NOT EXISTS "Master" ("playerID" TEXT, "coachID" TEXT, "hofID" TEXT, "firstName" TEXT, "lastName" INTEGER, "nameNote" INTEGER, "nameGiven" TEXT, "nameNick" INTEGER, "height" INTEGER, "weight" TEXT, "shootCatch" INTEGER, "legendsID" INTEGER, "ihdbID" TEXT, "hrefID" INTEGER, "firstNHL" INTEGER, "lastNHL" INTEGER, "firstWHA" TEXT, "lastWHA" TEXT, "pos" TEXT, "birthYear" TEXT, "birthMon" INTEGER, "birthDay" INTEGER, "birthCountry" INTEGER, "birthState" TEXT, "birthCity" TEXT, "deathYear" TEXT, "deathMon" TEXT, "deathDay" TEXT, "deathCountry" TEXT, "deathState" TEXT, "deathCity" TEXT); .... INSERT INTO Master VALUES('abelsi01','abelsi01c','abelsi01h','Sid',0,0,'Sidney Gerald',0,71,'170',0,0,'5931',0,1938,1953,'','','C','1918',2,22,0,'SK','Melville','2000','2','7','USA','MI',NULL);

I was transforming the sqlite to mysql tables, when i noticed this oddity...

mister-elliott commented 5 years ago

I am going to use the table descriptions, with a fix for create table master, and then see if I can import the data from the 2019-08-19 csv files using MySql workbench. If it all works, I can push the files, if you like, to an mysql directory. I will also publish the tables to my website, and work up some php for querying them. I am going to set up all text fields as VARCHAR(32), except for "nameGiven" and nameNick (varchar 964) and note as varchar(128). Probably will turn birth and death date fields into DATE field, easier to work in SQL.

rippinrobr commented 4 years ago

Thanks for the catch and please create a pull request with any and all changes you make to the schema and anything for MySQL support would be much appreciated.

mister-elliott commented 4 years ago

Have ingested the data a couple of times, and got some basic mysql tables. I am finding the data is not normalized for a relational DBMS. I am doing things like merging awardsplayers and awardscoaches to an awards table. In master table, I will copy coachID and/or hofID to playerID, iff playerID is empty. I would really like there to be one master person ID, valid as a primary key in master table, and as a foreign key elsewhere. also going to break the teams table in to a league table, and the rest of the teams table, with the team abbrev as the key for joins on those.

mister-elliott commented 4 years ago

and Gump Worsley first name (never used) is Lorne, and his nickname is Gump, though, I imagine nobody ever called him anything but Gump.:)

mister-elliott commented 4 years ago

the real fun here will be defining PKs and FKs to ensure data integrity; I have done this in sql server and oracle, but not at any scale in mysql.

mister-elliott commented 4 years ago

I have not followed hockeydatabank over the years. How does the data get aggregated to form season records? In my website, I store game records, and from that can generate standings, playoffs, etc, etc. but there is not enough data in the CSV files to create the aggregated season data, so where is it coming from?

rippinrobr commented 4 years ago

Hockey Databank is a series of flat files that is maintained by a group of people who I don't believe have much SQL experience. The project is in the process of moving to a different location. I can pass that info along to you if you'd like.

I like the idea of using PKs and FKs and had envisioned doing that as a separate schema for MySQL and Postgres but this project has kind of dropped off my radar recently.