Open SamDuffield opened 4 months ago
We should definitely add Stein variational gradient descent (paper, code)
SVGD requires a kernel specification. IMO we don't need to supply a suite of kernels ourselves (aside from maybe a default Gaussian kernel).
I think we should enforce a kernel signature like
eval = kernel(params1, params2, aux1, aux2, batch, **kernel_params)
where kernel_params are any kernel hyperparameters such as bandwidth.
kernel_params
To future-proof against more sophisticated kernels that e.g. could use info from the model call via aux1 and aux2.
aux1
aux2
Also we should think about how to support adaptive kernel_params updates like the median heuristic used in the SVGD [paper](https://proceedings.neurips.cc/paper/2016/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf.
We should definitely add Stein variational gradient descent (paper, code)
SVGD requires a kernel specification. IMO we don't need to supply a suite of kernels ourselves (aside from maybe a default Gaussian kernel).
I think we should enforce a kernel signature like
where
kernel_params
are any kernel hyperparameters such as bandwidth.To future-proof against more sophisticated kernels that e.g. could use info from the model call via
aux1
andaux2
.Also we should think about how to support adaptive
kernel_params
updates like the median heuristic used in the SVGD [paper](https://proceedings.neurips.cc/paper/2016/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf.