jbreffle / acx-prediction-contest

Analysis of the ACX 2023 Prediction Contest with an interactive Streamlit app
https://jbreffle.github.io/acx-app
5 stars 0 forks source link

ACX Prediction Contest

Analysis of the Blind Mode predictions data from the Astral Codex 2023 prediction contest to produce an answer for the Full Mode.

ACX links

Notebooks

Data

Streamlit

Run the Streamlit app locally with

python -m streamlit run ./streamlit/Home.py

The Streamlit app ./streamlit/Home.py is deployed at https://jbreffle.github.io/acx-app.

src

Manifold

Note: only the 1000 most recent bets for a market can be retrieved through the API. See https://docs.manifold.markets/api#get-v0bets:

limit: Optional. How many bets to return. The default and maximum are both 1000.

Scoring

The scoring method for the contest was not specified when the contest was announced. I used the Brier score for my analyses, but at the conclusion of the contest it was announced that the Metaculus scoring function would be used. I added additional analyses comparing how these two scoring methods compare.

The log score underpins all Metaculus scores, which is: $$ \text{log score} = \log(P(\text{outcome})) $$ where $P(\text{outcome})$ is the probability assigned to the outcome by the prediction. Higher scores are better. The log score is always negative, and they say this has proved unintuitive for users, so they have constructed the baseline score and the peer score as more intuitive alternatives.

The general form of the baseline score is: $$ \text{baseline score} = 100 \times \frac{\text{log score(prediction)} - \text{log score(baseline)}}{\text{scale}} $$ where $baseline$ is the baseline prediction that weights all outcomes equally and $scale$ is set so that a perfect prediction gives a score of $100$.

The general form of the peer score is: $$ \text{peer score} = 100 \times \frac{1}{N} \sum_{i=1}^N \text{log score}(p) - \text{log score}(p_i) $$ where $p$ is the scored prediction, $N$ is the number of other predictions, and $p_i$ is the $i$th other prediction.

TODO