Closed ngoodman closed 7 years ago
I can implement and test the lift
operation, and someone else could do the example models.
i can do this (both or example) its in line with the other things im doing
For Bayesian regression, benchmark will be from the Stan ADVI paper: https://arxiv.org/abs/1506.03431
They simulate their own data for the regression example which isnt in the paper or repo. Going to try to get my hands on it otherwise use a different paper.
update: so finding a replicable benchmark is turning out to be a less trivial task than i foresaw. essentially i need something that has all three things: data, model, results.
0) Stan paper: No data, contacted them for data (synthesized by them) 1) drugowitsch's paper has data and model, but no real results. they run it on a toy example to show that it 'works'. 2) edward has a regression example with a dataset i can access but dont report posterior results (they use it as a speed test comparison). 3) Gelman's book in Bayesian Data Analysis cites a dataset that i was able to get my hands on, but uses an improper distribution for alpha. After fiddling and discussing with @martinjankowiak, ill need to sample \alpha from truncated distribution to run VI on it. 4) [edit] theres a hierarchical logistic regression in the Stan paper that uses the same data as above.
I will continue hunting for a published result that has all 3 of what i need, but i was thinking in the meantime of doing (2) by modifying their code to use klqp and comparing that with ours. It will give us confidence that we have something that is correct, though not officially in a paper. [edit:] for a published result i can do (4) and compare log predictives against theirs. @ngoodman thoughts?
never simple, eh? :)
i think it's acceptable for the performance benchmark to be an available system, rather than a result reported in a paper. so this would suggest taking a model+data example (or a couple) and comparing the log-predictive and run time to stan and/or edward. using simple examples from the book (eg rats) seems like a good idea?
(in some sense i am suggesting extending the scope of this "anchor model" to be comparison tests against stan and/or edward for a single model. it would be nice to put that together in such a way that it's easy to then use the pipeline to do that comparison for other models....)
status update:
running logistic regression against edward's model
update:
random_module
+ nn to define the prior rather than sampling directly. have both versions
We want to use some combination of bayesian regression and bayesian NNs (to be determined) as anchor models for the first pyro release. These can be implemented in a way that shares a lot of work, and makes it extensible. The approach is to define a parametrized family of classifiers as as a pytorch Module; usually one would do MLE on this family to train a classifier, instead we will define a "lift" operation that upgrades the parameters to random variables; we then do VI (or other inference) for the posterior over lifted parameters. See the extensive discussion of this approach in #40.
We should implement the lifting operation and then test it on Bayesian regression examples. Then we can try Bayesian NNs.