Open twj8CDC opened 1 year ago
Hi, just to follow-up I think I found a work around to this. I was able to convert to list and pickle the all_trees object from the RV. This object does not have any issues w/ multiprocessing/connection that I see with the full BARTRV or the Model when pickled and loaded. Then I can use the pmb.utilites._sample_posterior(...)
function to get draws from the tree structure, which as it appears is how the pdp utility function works.
all_trees = list(model_bikes.mu_.owner.op.all_trees)
with open('test_all_tree.pkl', mode='wb') as file:
cpkl.dump(all_trees, file)
Restart session and reload modules
with open('test_all_tree.pkl', mode='rb') as file:
all_tree2 = cpkl.load(file)
rng = np.random.default_rng()
xt_2 = pt.constant(xt)
smpl = pmbu._sample_posterior(all_tree2, xt_2, rng, size = 400, shape=1)
mu_smpl = smpl.mean(0)
ex_mu = pm.math.exp(mu_smpl).eval()
ex_mu appears to appropriately match the pm.sample_posterior_distribution(...)
from the trained model and with the updated covariate matrix.
For my actual project, I only care about the mu value output from the BARTRV, so I think that is appropriate method to predict mu with a new covariate matrix. If I needed the y value I believe it could be extended with a sample from whatever the final distribution is w/ the mu value inputed, as I have seen in other examples.
If anyone can confirm that this is an appropriate way to save/load/predict from the BARTRV or provide me with an alternative method that would be great!
Thanks!
Sorry for the late reply, I was very busy with other stuff, still are actually so I am just answering to tell you that the workaround seems fine to me. I will try next week to inspect this issue with more time and see if I can provide a better answer or a more general solution.
Great! Thank you!
Are there any update? I am using the library with my team and we are having a problem to load the model. We don't know how to reinsert the trees into the model, we tried with model.owner.op.all_tress and BARTRV.all_trees. The solutions of @twj8CDC is not applicable to us, because we are using other distribution on top of bart.
Thanks
I have a similar issue. If I want to generate partial dependence plots (or anything else that relies on the all_trees object), I get an error ValueError: high <= 0
because it isn't storing the size of all_trees. I can confirm this is the issue because I ran the test model from the PyMC BART Github repo in the same notebook without any trouble since the model object is generated in the same notebook (whereas I am loading an nc trace file in my case).
I am having the same issue as above, any solution would be appreciated as we are building on top of the bart model as well.
Hi, I have been trying to save, reload and generate new predictions with a model that includes a BARTRV.
I am able to save the trace as a pickle (net_cdf works too), and then instantiate a new model and get the posterior predictions on the training data, but when I try to add new data I get shape errors. The shape errors are odd since when I train the model I can update the model with new data for predictions without any issues. It is only when I use the newly instantiated model that I am unable to update the input data.
Below is a minimal example:
Pickle instead of netcdf, but this seems to work fine
Posterior predictions on updated data works with OG model with the OG idata and the saved and loaded idata
Restart the session to test the load from a clean slate and reload the data from above
Specify the new model. Only difference is variable names
load the saved idata
get posterior predictions on the training data
This works minus a slight difference in predicted values, possible due to a difference in random state? The post5 compares well to the post1 and post2 above.
get the poster predictions with new data
This fails with the following error
I can't figure out where this shape error arises from. The trained model specified in the top allows for updating of data without issues, so I am not sure if there is a general issue with the way the model is specified?
Is there a different process to saving and reloading a model with a BARTRV?
I also tried pickling the whole model, but that doesn't work because of the multiprocessing components in the BART object. I get a socket error when trying to reload the pickled object.
I have also posted this in the discourse, as I was not sure where it makes the most sense to discuss this issue.
Thanks!