Analysis of the Blind Mode predictions data from the Astral Codex 2023 prediction contest to produce an answer for the Full Mode.
/raw/
: Raw data from the contest, should not be modified/processed/
: results of analysis that are not tracked in git/results/
: important results of analysis that are tracked in gitRun the Streamlit app locally with
python -m streamlit run ./streamlit/Home.py
The Streamlit app ./streamlit/Home.py
is deployed at
https://jbreffle.github.io/acx-app.
Note: only the 1000 most recent bets for a market can be retrieved through the API. See https://docs.manifold.markets/api#get-v0bets:
limit: Optional. How many bets to return. The default and maximum are both 1000.
The scoring method for the contest was not specified when the contest was announced. I used the Brier score for my analyses, but at the conclusion of the contest it was announced that the Metaculus scoring function would be used. I added additional analyses comparing how these two scoring methods compare.
The log score underpins all Metaculus scores, which is: $$ \text{log score} = \log(P(\text{outcome})) $$ where $P(\text{outcome})$ is the probability assigned to the outcome by the prediction. Higher scores are better. The log score is always negative, and they say this has proved unintuitive for users, so they have constructed the baseline score and the peer score as more intuitive alternatives.
The general form of the baseline score is: $$ \text{baseline score} = 100 \times \frac{\text{log score(prediction)} - \text{log score(baseline)}}{\text{scale}} $$ where $baseline$ is the baseline prediction that weights all outcomes equally and $scale$ is set so that a perfect prediction gives a score of $100$.
The general form of the peer score is: $$ \text{peer score} = 100 \times \frac{1}{N} \sum_{i=1}^N \text{log score}(p) - \text{log score}(p_i) $$ where $p$ is the scored prediction, $N$ is the number of other predictions, and $p_i$ is the $i$th other prediction.