greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
528 stars 63 forks source link

[Feature Request] Save draws to disk straight from python/TF rather than serialize and pull into R #358

Open rexdouglass opened 3 years ago

rexdouglass commented 3 years ago

I have a very large model whose performance degraded nonlinearly the more parameters I added. After a great deal of debugging, I realized that just the pure size of the parameters and draws getting serialized and pulled back into R accounted for most of the pain.

It would be useful for large models to be able to dump the results straight to disk and then summarize them later (e.g. using sparklyr).

My short term work around is to set n_samples=0, fit the model, and then grab a small set of a 100 at a time with extra_samples(), save those to disk, and load them all back in spark afterward.

Thanks for a great API greatly shortening the learning path to TFP.

I should add there might be other places in Greta that don't scale to large models either. m$dag$adjacency_matrix is a dense matrix for example and so scales quadratically in nodes.

njtierney commented 3 years ago

Thanks for your request - we will evaluate this when we are preparing the release for version 0.5.0 to work out how we might be able to implement your feature request :)