mpope9 / nba-sql

:basketball: An application to build an NBA database backed by MySQL, Postgres, or SQLite
Apache License 2.0
176 stars 20 forks source link

Player info column data has changed #21

Closed avadhanij closed 3 years ago

avadhanij commented 3 years ago
nbasql ❯ python stats/nba_sql.py --season 2020-21 --database postgres --create_schema True
Connecting to postgres database.
Initializing schema.
Populating team data
team Table Loading |██████████████████████████████| 100.0%
Populating player data
player Table Loading |██████████████████████████████| 100.0%
Loading event types.
> /Users/avadhanij/dev/python/nba-sql/stats/player_game_log.py(70)fetch_season()ne will take a while...
-> if '@' in row[8]:
(Pdb) row
['2020-21', 1628422, 'Damyean Dotson', 'Damyean', 1610612739, 'CLE', 'Cleveland Cavaliers', '0022001067', '2021-05-16T00:00:00', 'CLE @ BKN', 'L', 28.316666666666666, 7, 9, 0.778, 3, 5, 0.6, 0, 0, 0.0, 0, 1, 1, 3, 4, 0, 0, 0, 1, 0, 17, -4, 18.7, 0, 0, 1, 11623, 11623, 11623, 7873, 3206, 8309, 1717, 1926, 4922, 2762, 12008, 12752, 12008, 11150, 15987, 17542, 5509, 764, 10665, 7210, 1, 4919, 16020, 4514, 14475, 11875, 1849, 143, 1]
(Pdb) n
> /Users/avadhanij/dev/python/nba-sql/stats/player_game_log.py(74)fetch_season()
-> 'season_id': season_int,
(Pdb) c
Traceback (most recent call last):
  File "/Users/avadhanij/dev/python/nba-sql/stats/nba_sql.py", line 261, in <module>
    main()
  File "/Users/avadhanij/dev/python/nba-sql/stats/nba_sql.py", line 123, in main
    player_game_log_requester.fetch_season(season_id)
  File "/Users/avadhanij/dev/python/nba-sql/stats/player_game_log.py", line 74, in fetch_season
    'season_id': season_int,
ValueError: invalid literal for int() with base 10: 'Cleveland Cavaliers'

Looks like the format for the latest incoming data has changed. Here are the main changes -

  1. team_id has moved to column 4

This caused everything else to moved by one column. I am going to push a fix for this issue.

But imo, the current approach is not good. They are supplying column names with data anyway, so it's better to dynamically fill using that info? Let me know what you think, and I can file an enhancement. I don't mind working on it.

mpope9 commented 3 years ago

Yeah you are right, using the column spec to get the index dynamically is better. I'm open to the enhancement. I did not get around to fixing the other issues, but I have some queries to run on the current season so I will poke at these soon.

avadhanij commented 3 years ago

Hey, so I hope you don't mind that I am filing issues away, because I am going to be honest, there are quite a lot of them. I am slowly beginning to understand the code base.

It's been one issue after another, and I am yet to be able to load the database up with the given script.

mpope9 commented 3 years ago

Fixed by https://github.com/mpope9/nba-sql/pull/22 !