GLM poisson - Githubissues

OriolAbril commented 3 years ago

File: https://github.com/pymc-devs/pymc-examples/blob/main/examples/generalized_linear_models/GLM-poisson-regression.ipynb Reviewers:

The sections below may still be pending. If so, the issue is still available, it simply doesn't have specific guidance yet. Please refer to this overview of updates

Known changes needed

Changes listed in this section should all be done at some point in order to get this notebook to a "Best Practices" state. However, these are probably not enough! Make sure to thoroughly review the notebook and search for other updates.

General updates

Use numpy generator
:warning: code cells 15 and 19 are plain wrong, we are doing np.exp(np.mean()) instead of np.mean(np.exp()).

ArviZ related

code cell 15 (again) is computing the whole summary dataframe, when only a subset of the columns are needed. We should either use kind="stats" or customize summary, examples of both at: https://arviz-devs.github.io/arviz/api/generated/arviz.summary.html

Notes

Exotic dependencies

None

Computing requirements

Models sample in less than a minute

jessicakzhang commented 3 years ago

Hi! I'd like to try working on this.

OriolAbril commented 3 years ago

That would be great @jessicakzhang! I have added a couple of suggestions based on a quick look over the notebook, I'll review more carefully once you submit a PR.

Let us know if you have any doubt while working on this

chiral-carbon commented 3 years ago

Hi @jessicakzhang , are you still working on this issue? @OriolAbril would it be okay if I were to submit a PR for this issue, considering the fact that this issue has already been assigned?

OriolAbril commented 3 years ago

Hi, yes, as it has been more that two weeks with no activity, as indicated in the contributing guide, I'll assign the issue to you so you can submit a PR.

chiral-carbon commented 3 years ago

thanks a lot!

chiral-carbon commented 3 years ago

@OriolAbril had a doubt regarding updating np.exp(np.mean()) to np.mean(np.exp()) in cells 15 and 19. as far as I understood, az.summary returns certain statistics of which mean is one, and we are computing np.exp() for this summary dataframe. should this be changed to compute np.mean() of az.summary? I am unclear as to exactly where we should be applying np.exp() here.

also, while updating az.summary to use args rather than manually subsetting the dataframe, should we show all the default stats (mean, sd, hdi_3%, hdi_97%) for the variables or only a subset of mean, hdi_3%, hdi_97% as is being shown currently?

OriolAbril commented 3 years ago

I used mean as a placeholder, summary is acting as mean (as well as acting as hdi). Exponentiating should come first, then calling summary on the exponentiated data, not the other way around.

I think the default stats is good enough and it's simple, there is no need to overly complicate the notebook only to exclude sd from summary.

chiral-carbon commented 3 years ago

oh okay, thanks for clarifying. so when I do the exponentiation on inf_fish which is an InferenceData object, I should first convert it to a data frame and exponentiate that data frame, and only then create a summary for it. have I understood this correctly? but az.summary takes an InferenceData object, so what would be the best way to exponentiate inf_fish here?

OriolAbril commented 3 years ago

You should exponentiate the posterior samples, which are a group in inferencedata, in the form of an xarray dataset. It should look something like: az.summary(np.exp(idata.posterior), ...)

chiral-carbon commented 3 years ago

yes, that works, thanks a lot!

chiral-carbon commented 3 years ago

in cell 19, I think there's a typing error. the code in cells 15 and 19 are identical and call a summary for the same variable inf_fish, which was generated using the manual model. cell 19 should be displaying the summary for the model results created with glm.from_formula and which are stored in the variable inf_fish_alt. can you confirm this?

also, should the data in the markdown cell after cell 15 be altered to match the new mean and hdi that we see in the summary? the values are only very slightly off Screenshot from 2021-05-05 20-58-23

OriolAbril commented 3 years ago

in cell 19, I think there's a typing error. the code in cells 15 and 19 are identical and call a summary for the same variable inf_fish, which was generated using the manual model. cell 19 should be displaying the summary for the model results created with glm.from_formula and which are stored in the variable inf_fish_alt. can you confirm this?

yes, it is definitely a typo.

also, should the data in the markdown cell after cell 15 be altered to match the new mean and hdi that we see in the summary?

I would update it to avoid confusing readers

OriolAbril commented 3 years ago

Needs to be updated to use bambi instead of glm module

chiral-carbon commented 3 years ago

will be working on it @OriolAbril

drbenvincent commented 2 years ago

I'm about to update this to v4

pymc-devs / pymc-examples

GLM poisson #86

Known changes needed

General updates

ArviZ related

Notes

Exotic dependencies

Computing requirements