Weight paper - Githubissues

rfl-urbaniak commented 5 months ago

Write a chapter/paper on weight of evidence.

For now, take a look at the weight-paper folder, the file is imprecision_weight.Rmd. For now I did not bother with conversion. @Niklewa can you for now compile a pdf making sure the Rmd works (I have some package issues that I don't have the time to play with?)

TBD: is it going to be a paper first? TBD: is it supposed to be self-contained, or do we set it up as a chapter and then don't waste time on HOP set-up?

Generally, what part of the material is worth keeping/expanding upon and what relation to the imprecision chapter this should have.

Niklewa commented 5 months ago

I have managed to fix the knitting of the Rmd file (at least on my setup), @marcellodibello feel free to read the PDF.

marcellodibello commented 5 months ago

i found notes i wrote about weight some time ago so i added the pdf and rmd files to the folder

marcellodibello commented 5 months ago

it seems to me the chapter on weight should focus on at least four fronts:

develop a plausible set of constraints or requirements that a reasonable theory of weight should satisfy.
identify neighboring notions---i can think of three, such as evidential completeness, resilience, and higher-order probability---and see how they are different from or similar to weight
legal cases in which weight plays a role and makes a difference, either in evidence assessment or in decision making
develop a theory of weight that satisfies the constraints in 1 and is applicable and useful in the legal context

marcellodibello commented 5 months ago

here giving an example of a constraint in point 1 more precisely (drawn from rafal's note, slight rephrased):

MONOTONICITY: if E1 is a subset of E2 (or any evidence that is in E1 is also in E2), then W(H, E1)<E(H, E2), where both E1 and E2 consists of items of evidence that (considered individually) are relevant (probability changing?) for H.

In other words, weight increases as the quantity of evidence increases. Weight need not be reduced to quantity of evidence, but quantity is a part of weight. This seems a good starting point for any account of weight.

The examples from Keynes and Pierce motivate this constraint: If I sample more balls from an urn, or I collect more arguments about a certain issue, then no matter what color the balls turn out to have, or no matter what the arguments end up supporting, I will have weightier evidence overall. This seems to be a good starting point for any account of weight.

Does Rafal's account of weight (which tracks the distance from the uninformative, uniform distribution) satisfy MONOTONICITY? This is not completely clear to me. What do you say @rfl-urbaniak? I think it does, but I want to be sure.

Here is why I am confused. To recap, Weight (of a distribution p) is the distance of p from the uniform distribution: the greater the distance, the greater the weight (informativeness) of the distribution p. Weight of evidence E is the delta between the weight of the prior distribution and the weight of the posterior distribution (given evidence E).

Now, question: Could Weight(prior distribution)=Weight(posterior distribution given E), in the sense that the new evidence E simply confirms what what the prior distribution was? Or are you assuming that if E is relevant, then it is impossible for Weight(prior distribution)=Weight(posterior distribution given E) so new relevant evidence always brings a change in the weight of the distribution? How are you thinking about such cases?

Is the key idea that the distribution in question must be higher-order, so you are not talking about distributions over propositions but distribution over probabilities of propositions? For it is possible that P(H)=P(H | E1)=P(H, E1, E2)=P(H| E1, E2, ..., En)=0.5, no matter how much evidence I add, where P is a first-order distribution over propositions. But, it is NOT possible for f(P(H))=f(P(H) | E)=f(P(H) | E1, E2)=f(P(H) | E1, E2,.... En)=uniform, where f is the higher-order distribution over the probabilities of H. Is this right @rfl-urbaniak?

marcellodibello commented 5 months ago

regarding the theory of weight we are working with, point 4 above, rafal's theory, would the following formulation be correct and faithful to the spirit of the theory @rfl-urbaniak?

we can talk about first-order weight in terms of Good's proposal, as a function of the first-order probabilities P(E | H) and P(E | not-H), or which is essentially the same thing, as the difference between P(H) and P(H | E). The bigger the difference (upwards or downward) or the greater the deviation from 1 of the ratio P(E | H)/P(E | not-H) (upwards or downwards), the bigger the weight. this is weight within first-order probabilities as a function of prior and posterior probabilities.
we can extend the same concept at the second-order level and talk about second-order weight of evidence. instead of the prior probability of a hypothesis P(H), consider higher-order prior distribution over the probabilities for H, denoted by f(P(H)). instead of the posterior probability P(H | E),consider the posterior higher-order distribution given E over the probabilities of H, denoted f(P(H) | E). so just like first-order weight of evidence is a function of P(H) and P(H | E) -- or a la Good, a function of P(E | H) and P(E | not H) -- similarly, second-order weight is a function of prior higher-order distribution f(P(H)) and the posterior higher-order distribution f(P(H) | E). does this make sense as a general framework?

If what is above is right, then we can simply say Good is working at the first-order level of weight, but there is a second-order level of weight that is also important.

QUESTION for @rfl-urbaniak: if it is right to say that second-order weight of E is a function of prior higher-order distribution f(P(H)) and the posterior higher-order distribution f(P(H) | E), do we need to construe that function as the difference between the informativeness of the two higher-order distribution? would it make sense to take instead simply ratio of the two distributions, not their informativeness (to mimic something like the likelihood ratio at the second-order level)?

Note that we could also take the difference in the informativeness between the prior P(H) and posterior P(H | E) first-order distribution instead of second-order distribution. What would that difference be?

So we have at least four formal constructs:

difference at the first order level: difference (or ratio) between P(H) and P(H | E)
difference at the second order level: difference (or ratio) between f(P(H)) and f(P(H) | E)
informativeness at the first order level: difference in informativeness between P(H) and P(H | E)
informativeness at the second order level: difference in informativeness between f(P(H)) and f(P(H) | E))

rfl-urbaniak commented 5 months ago

Something like this:

def run_svi_inference(
    model,
    num_steps=500,
    verbose=True,
    lr=0.03,
    guide=None,
    blocked_sites=None,
    **model_kwargs,
):
    losses = []
    # running_loss_means = []
    if guide is None:
        guide = AutoMultivariateNormal(pyro.poutine.block(model, hide=blocked_sites))
    elbo = pyro.infer.Trace_ELBO()(model, guide)

    elbo(**model_kwargs)
    adam = torch.optim.Adam(elbo.parameters(), lr=lr)
    print(f"Running SVI for {num_steps} steps...")
    for step in range(1, num_steps + 1):
        adam.zero_grad()
        loss = elbo(**model_kwargs)
        loss.backward()
        losses.append(loss.item())
        adam.step()
        if (step % 100 == 0) or (step == 1) & verbose:
            print("[iteration %04d] loss: %.4f" % (step, loss))

    plt.figure()
    plt.plot(losses, label="ELBO loss")
    sns.despine()
    plt.title("ELBO Loss")
    plt.ylim(0, max(losses))
    plt.legend()
    plt.show()

    return guide

rfl-urbaniak commented 5 months ago

or like this:

def get_samples(
    distance,
    proximity,
    how_far,
    model=model_sigmavar_proximity,
    num_svi_iters=num_svi_iters,
    num_samples=num_samples,
):
    guide = AutoMultivariateNormal(model, init_loc_fn=init_to_mean)
    svi = SVI(
        model_sigmavar_proximity, guide, optim.Adam({"lr": 0.01}), loss=Trace_ELBO()
    )

    iterations = []
    losses = []

    logging.info(f"Starting SVI inference with {num_svi_iters} iterations.")
    start_time = time.time()
    pyro.clear_param_store()
    for i in range(num_svi_iters):
        elbo = svi.step(distance, proximity, how_far)
        iterations.append(i)
        losses.append(elbo)
        if i % 50 == 0:
            logging.info("Elbo loss: {}".format(elbo))
    end_time = time.time()
    elapsed_time = end_time - start_time
    logging.info("SVI inference completed in %.2f seconds.", elapsed_time)

    # uncomment if you want to see the ELBO loss plots
    # fig = px.line(x=iterations, y=losses, title="ELBO loss", template="presentation")
    # labels = {"iterations": "iteration", "losses": "loss"}
    # fig.update_xaxes(showgrid=False, title_text=labels["iterations"])
    # fig.update_yaxes(showgrid=False, title_text=labels["losses"])
    # fig.update_layout(width=700)
    # fig.show()

    predictive = Predictive(model, guide=guide, num_samples=num_samples)

    proximity_svi = {
        k: v.flatten().reshape(num_samples, -1).detach().cpu().numpy()
        for k, v in predictive(distance, proximity, how_far).items()
        if k != "obs"
    }

    print("SVI-based coefficient marginals:")
    for site, values in ft.summary(proximity_svi, ["d", "p"]).items():
        print("Site: {}".format(site))
        print(values, "\n")

    return {
        "svi_samples": proximity_svi,
        "svi_guide": guide,
        "svi_predictive": predictive,
    }

Niklewa commented 4 months ago

@rfl-urbaniak, I have added the quarto version of the weight paper. In the file, I have included a simple example of the weight of evidence.

rfl-urbaniak / MRbook

Weight paper #67