Add CLI support for unconditional sampling and assessing data likelihood of any dataset

Schaechtle commented 2 months ago

What does this issue request?

Add CLI support for two modes of evaluating posterior models, to make it easy to do either of two things:

1. Unconditional sampling of synthetic data

The goal here is to generate posterior samples to create datasets that can be compared to the training data/observations and held-out data.

2. Compute held-out likelihood scores

The goal here is to be able to compute held-out likelihood scores for training and held-out data.

Why do we want this?

This allows us to compare special cases of GenDB against relevant baselines, like previous CrossCat implementations. Note that some relevant baselines may generate synthetic data but not allow assessing held-out likelihood. The effort should be particularly relevant for evaluations that would be needed for a future publication -- but it's also not urgent right now.

ThomasColthurst commented 2 months ago

For the held-out likelihoods, do you need them per held-out item or only per test set?

(Or to ask another way: if there are 1000 items in the held out test set, do you want a single log likelihood for all of them, or do you want 1000 individual log likelihoods?)

Schaechtle commented 2 months ago

For the held-out likelihoods, do you need them per held-out item or only per test set?

(Or to ask another way: if there are 1000 items in the held out test set, do you want a single log likelihood for all of them, or do you want 1000 individual log likelihoods?)

I would return a table or list of the latter because we can aggregate easily -- but we also may want to dig into which rows generate which log-likelihood values in case we encounter bugs (e.g. NaNs for certain held-out rows).

probcomp / hierarchical-irm