pymc-devs / pymc-examples

Examples of PyMC models, including a library of Jupyter notebooks.
https://www.pymc.io/projects/examples/en/latest/
MIT License
259 stars 211 forks source link

BART: Categorical example #663

Closed PabloGGaray closed 1 month ago

PabloGGaray commented 1 month ago

Closes https://github.com/pymc-devs/pymc-bart/issues/100


📚 Documentation preview 📚: https://pymc-examples--663.org.readthedocs.build/en/663/

review-notebook-app[bot] commented 1 month ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

review-notebook-app[bot] commented 1 month ago

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:22Z ----------------------------------------------------------------

Use az.style.use("arviz-darkgrid")

Remove plt.rcParams["figure.dpi"] = 300


review-notebook-app[bot] commented 1 month ago

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:23Z ----------------------------------------------------------------

The pdp plot, together with the Variable Importance plot, confirms that Tail is the covariable with the smaller effect over the predicted variable. In the Variable Importance plot Tail is the last covariable to be added and does not improve the result, in the pdp plot Tail has the flattest response.


review-notebook-app[bot] commented 1 month ago

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:23Z ----------------------------------------------------------------

Add to the next section and compare with the PPC plot or remove it


review-notebook-app[bot] commented 1 month ago

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:24Z ----------------------------------------------------------------

So far we have a very good result concerning the classification of the species based on the 5 covariables. However, if we want to select a subset of covariable to perform future classifications is not very clear which of them to select. Maybe something sure is that Tail could be eliminated. At the beginning when we plot the distribution of each covariable we said that the most important variables to make the classification could be Wing, Weight and, Culmen, nevertheless after running the model we saw that Hallux, Culmen and, Wing, proved to be the most important ones. 

Unfortunatelly, the partial dependence plots show a very wide dispersion, making results look suspicious. One way to reduce this variability is adjusting 3 independent trees, below we will see how to do this and get a more accurate result.


review-notebook-app[bot] commented 1 month ago

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:25Z ----------------------------------------------------------------

Fitting independent trees

The option to fit independent trees with pymc-bart is set with the parameter pmb.BART(..., separate_trees=True, ...). As we will see, for this example, using this option doesn't give a big difference in the predictions, but helps us to reduce the variability in the ppc and get a small improvement in the in-sample comparison. In case this option is used with bigger datasets you have to take into account that the model fits more slowly, so you can obtain a better result at the expense of computational cost. The following code runs the same model and analysis as before, but fitting 3 independent trees. Compare the time to run this model with the previous one


PabloGGaray commented on 2024-05-23T16:00:54Z ----------------------------------------------------------------

It's ok the "3" in "but fitting 3 independent trees."?

aloctavodia commented on 2024-05-23T16:06:16Z ----------------------------------------------------------------

Well, it is 3 independent "sum of trees". Better to remove the "3"