py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
6.99k stars 923 forks source link

Update/refine causal model #1012

Closed LucaGiamattei closed 12 months ago

LucaGiamattei commented 1 year ago

First of all thanks a lot for the hard work. I was wondering if there was a way to update/refine a causal model as new data becomes available. In other words, in gcm for instance, assume we have the causal structure and data available: we can use the auto "assign_causal mechanisms" and "fit" functions to get the fitted causal model. Now, assuming that we have the previously fitted causal model and new data, do we have to pass all data (old + new)? is there a way to just fit the new data to refine the old model?

drawlinson commented 1 year ago

@LucaGiamattei You can do inference on new data with models which support the do() interface (i.e. regression models). However this doesn't do a new fit. If you want to perform validation on the model with new data, you can also use the do() function. If you want to retrain the model, in ML an incremental train is usually not as good as a complete retrain, so you're probably better off doing the full retrain. You can automate all your validation and refutation steps to check the new model is as good as the previous one. Generally small -medium tabular datasets train in seconds so this shouldn't be impractical ...

bloebp commented 1 year ago

First of all thanks a lot for the hard work. I was wondering if there was a way to update/refine a causal model as new data becomes available. In other words, in gcm for instance, assume we have the causal structure and data available: we can use the auto "assign_causal mechanisms" and "fit" functions to get the fitted causal model. Now, assuming that we have the previously fitted causal model and new data, do we have to pass all data (old + new)? is there a way to just fit the new data to refine the old model?

Hey, you can also re-fit the mechanism of a particular node once you have more/new data via: https://github.com/py-why/dowhy/blob/main/dowhy/gcm/fitting_sampling.py#L48 And to set a new mechanism for a partiuclar node, you can take a look at the example here https://www.pywhy.org/dowhy/v0.10/user_guide/modeling_gcm/index.html (below the auto assignment example).

So, no need to re-fit the whole graph.

@drawlinson I think the question was related to re-fitting causal mechanisms of a subset of nodes in a GCM.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 12 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.