Open rzu512 opened 5 years ago
I have the same problem. I am fitting a bayesian changepoint model where the samples fill up my 32 gb memory fairly quickly.
Hi, I am facing the same problem. Is this issue resolved in the new tf version..
You can use thinning to get fewer results out. Pass num_steps_between_results to sample_chain. Picking the most likely kind of defeats the purpose of MCMC, which is to get a set of "typical" samples.
On Fri, Aug 21, 2020, 3:57 AM TahaSaleh01 notifications@github.com wrote:
Hi, I am facing the same problem. Is this issue resolved in the new tf version..
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/436#issuecomment-678101554, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI5VZRRL2RSU2TFVAWTSBYSF3ANCNFSM4HRKGKKA .
Is there a way to offload the old samples to system memory, keeping GPU memory free, while running the chain?
There is the notion of a Reducer in tfp.experimental.mcmc (used w/ the WithReductions transition kernel). You could write a py_function that writes out, say, 10 underlying transition kernel samples to disk, then returns only the last one.
Brian Patton | Software Engineer | @.***
On Wed, Mar 17, 2021 at 11:51 AM Josh Chang @.***> wrote:
Is there a way to offload the old samples to system memory, keeping GPU memory free, while running the chain?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/436#issuecomment-801194833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI54F5JHWUMYEX2ZKV3TEDFXPANCNFSM4HRKGKKA .
Like @brianwa84 suggests, we tend to just chunk our MCMC, firing off a burst of say 1000 samples at a time and dumping to an HDF5 file. Something like
num_burst_samples = 1000
num_bursts = 50
final_results = None
for i in num_bursts:
samples, results, final_results = tfp.mcmc.sample_chain(num_burst_samples,
current_state=current_state,
kernel=kernel,
previous_kernel_results=final_results,
return_final_kernel_results=True)
hdf5_array[(i*num_burst_samples):(i*num_burst_samples + num_burst_samples)] = samples
current_state = [s[-1] for s in samples]
which works okay, since typically the write outweights the cost of a kernel startup. A better approach would be to write a function that calls tfp.mcmc.sample_chain
and decorate with @tf.function
with or without experimental_compile=True
. The key is to return the final_kernel_results so you can restart the chain for the next chunk from where it left off.
Chris
I run Hamiltonian Monte Carlo on 4 copies of my model for 10^5 steps on a GPU. Each copy of the model contains about 1000 parameters. The log-likelihood function contains
tf.scan
. The main (cpu) memory was quickly filled up.Can I just get the value of the parameters that give the largest log-likelihood instead of the whole trace of samples?