guidopetri / chess-pipeline

Pulling games from the Lichess API into a PostgreSQL database for data analysis.
GNU General Public License v3.0
20 stars 2 forks source link

Win probabilities CSV has improper numbers #45

Closed guidopetri closed 3 years ago

guidopetri commented 3 years ago

The CSV for win probabilities uses improper numbers. E.g.:

-47.59999999997319,1.2584390499731036e-06,0.0

Postgres can't understand scientific notation, and it's probably also best to truncate the first number (which is the stockfish eval value) to 2 decimal places. Wherever scientific notation was used, Postgres is currently importing a NaN value (which also doesn't seem to be captured by is NULL). Obviously this breaks any possible analyses. Fixing this should be pretty easy: just change the .ipynb that generates the file to output it correctly.

guidopetri commented 3 years ago

It seems like psql did import the numbers correctly, but somehow they get selected improperly via Redash/psycopg2. The hotfix is:

update win_probabilities set eval = round(eval::numeric, 2);
update win_probabilities set probability_lr = round(probability_lr::numeric, 6);
update win_probabilities set probability_bayes = round(probability_bayes::numeric, 6);

Evals are rounded to 2 decimal places since that was the original list of values I used, probabilities were rounded to 6 decimal places so that they can be expressed as percentages with 4 decimal places.

guidopetri commented 3 years ago

Fixed in 13bb16b.