Closed aolney closed 4 years ago
I added the theory part. This should now be done.
Great, thanks :) Apologies I've been focusing on the ones that were closer, so I doubly appreciate you being proactive on this.
Reopening to help me track your latest changes
Some thoughts on https://github.com/memphis-iis/datawhys-content-notebooks/pull/49
The role of the histograms is to develop a general habit to look at the distribution of the values for the predictors/features. It's always a good idea to do that to notice anything unusual such as outliers, etc.
On Wed, Jun 17, 2020 at 1:05 AM Andrew M Olney notifications@github.com wrote:
Some thoughts on #49 https://github.com/memphis-iis/datawhys-content-notebooks/pull/49
- Updated to remove material already covered in previous days
- Not sure how to articulate purpose of histograms, since the analysis does not seem to adjust to them
- Would probably be nice to add odds ratio interpretation of the coefficients, but I'm out of gas for today
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/memphis-iis/datawhys-content-notebooks/issues/10#issuecomment-645171222, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADIRI3HM64A3ZWXGAEHQSS3RXBMLJANCNFSM4M2B6M2Q .
Totally agree; this is something I do routinely myself. But I wasn't sure how you wanted to frame it in this specific notebook, for both the histograms and the correlation matrix. Here are some thoughts:
Right now it seems more aligned with 1 than the others. If we wanted to enhance their understanding of the effect of replacing missing values with the median, we could try before/after comparison plots. With regard to 2, we could use it as an opportunity to introduce transformations like log or sqrt. With regard to 3, we could add discussion about looking at which variables correlate strongly with the class label (do any? I don't recall that) and with each other (none).
Closing for now to clear the board with https://github.com/memphis-iis/datawhys-content-notebooks/commit/749c6dd447f8fc177067de5c3469676cf7adacc8
We can continue discussion as it's closed :smile:
One thing I added to the PM notebook is interpreting the coefficients.
See the spreadsheet for details
Ideas/prereqs: Binomial distribution (brief mention of distributions in general), Regression vs classification, confusion matrix, accuracy, precision/recall
Direct link https://jupyter.olney.ai/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fmemphis-iis%2Fdatawhys-content-notebooks&subPath=Logistic-regression.ipynb&app=lab