shufinskiy / nba_data

NBA play-by-play data from stats.nba.com, data.nba.com, pbpstats.com, and also shots information with season 1996/97
Apache License 2.0
56 stars 7 forks source link

`game_id` is a varchar #4

Closed atlhawksfanatic closed 4 months ago

atlhawksfanatic commented 5 months ago

Flagging that from the NBA's perspective, game_id is a 10 digit character vector but all the uncompressed csv files in this repository have converted this variable to a numeric and taken off the first two digits. This does make for a more efficient compression as the first two digits are always "0", but it distorts what a researcher would have received from making calls to these APIs themselves as this variable is now of a different type.

And just for reference the game_id is the form XXXYYZZZZZ:

shufinskiy commented 5 months ago

@atlhawksfanatic, you are right, the GAME_ID has been converted to a number, and the first two zeros have been removed. Initially, the dataset was not an exact copy of the raw data received from the NBA API, it had several transformations. Now I have removed all transformations except GAME_ID. I may make the GAME_ID a string and the data in the dataset will become an exact copy of the NBA API data.

atlhawksfanatic commented 5 months ago

I like the rawness of the data that you're providing. While it's good skill to figure out how to find various NBA API endpoints, determine all the parameters, and construct your own query it is ridiculous to wait hours/days to get all the necessary information for the next steps in their analysis and this repo fills that gap nicely.

You might be fine just noting in documentation that GAME_ID is missing left padded "00".

shufinskiy commented 4 months ago

@atlhawksfanatic good point. I will add information about this in README