Closed NathanielF closed 10 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
ricardoV94 commented on 2023-10-08T23:16:38Z ----------------------------------------------------------------
Any reason not to use pm.Censored
for the likelihood ?
NathanielF commented on 2023-10-09T07:54:47Z ----------------------------------------------------------------
In the case of the CoxPH regression the "Poisson trick" is a classic of the literature and it works to give me the results i was expecting. It's also consistent with the approach already documented in Austin's notebook.
More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential. And again in the case of the Weibull AFT my current parameterisation gives the expected results.
But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored liklihoods for survival?
In the case of the CoxPH regression the "Poisson trick" is a classic of the literature and it works to give me the results i was expecting. It's also consistent with the approach already documented in Austin's notebook.
More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential. And again in the case of the Weibull AFT my current parameterisation gives the expected results.
But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored liklihoods for survival?
View entire conversation on ReviewNB
But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored likelihoods for survival?
From a first glance it sounded like you just wrote your own Censored likelihood by hand. Since we implemented it in PyMC we've been moving examples towards using the Censored factory instead because that fits much more with the PyMC vibe. Similarly, nobody is writing Potentials for Mixture likelihoods either, because we have pm.Mixture.
In the case of the CoxPH regression the "Poisson trick" is a classic of the literature
I am not familiar with the Poisson trick. Google didn't elucidate it quickly for me.
More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential.
I was talking about this one yes, not the Poisson trick. Sounds like a bug or a difference in the parametrization of the PyMC-defined Weibull and what you're comparing against. Using pm.Censored could be slower but shouldn't give different results. Logp wise it should be equivalent to what you did with the Potential.
Can you show me an example of how to use the censored distribution for AFT models. This is what i tried....
I am not familiar with the Poisson trick. Google didn't elucidate it quickly for me.
See maybe here: https://cran.r-project.org/web/packages/survival/vignettes/approximate.pdf Also, in a textbook i have at home...
In that example you didn't pass observed.
When you have mixed censored and not censored you have to pass an array to upper
with +inf
(or anything above y) for the uncensored obs and y for the censored ones. Something like
pm.Censored("obs", dist, lower=None, upper=np.where(cens, y, np.inf), observed=y)
Ah... i did miss observed... but now i just get a crash out:
Similar crashes for different upper bounds
Maybe try upper=np.where(cens, y, y+1)
. The last value has to be larger than the observations when they are not censored. np.inf could be leading to numerical issues in the logcdf
So that does fit:
But gives predictions out of line with Lifelines and my prior Weibull fit. Top table here is the new fit predictions. Bottom is the lifelines predictions and my Potential based fit agreed with Lifelines here
Your upper
is still weird. Do you have a constant censoring at y=20
? If that's the case you should just be able to set upper=20
upper
is the point of censoring. You are not allowed to observe any value beyond upper. If it matches exactly with upper
it means that observation was censored. If it is below, it was not censored. upper=np.where(cens, y+1, 20)
is odd. It say for the cases where cens
is True, the censoring point is y+1 (which means it won't be treated as censored in the logp, because it's always higher than the observed value), otherwise it's 20.
Ugh, sorry. Don't know where my head is this morning!
The max y in the data set is 12. But it crashes out under this parameterisation:
Just as a sanity check do you get a -inf logp at the starting point if you just use a vanilla Weibull (without censoring). I wonder if the PyMC parametrization is just different than what you were working with.
You mean, just run the observed like this:
Seems to run fine
No, run it with all observations, censored and non-censored as if they were all uncensored
Seems to run just fine with all observations:
Interesting, that suggests a precision issue with the CDF (or a bug in the CDF). Can you open a GitHub issue with a minimal reproducible example?
@ricardoV94 added the issue here: https://github.com/pymc-devs/pytensor/issues/471
Marking this as ready for review. I think i solved the issue translating between the Logistic model fit and the log-logistic survival curve i wanted to estimate. Still have a few additional touches, such as adding the references and tidying a conclusion but i thought i'd open it up for review now.
As stated in the issue the broad idea of this MR is to demonstrate with a more detailed example how to actually build and use the survival regression models in PyMC. In particular i wanted to show how we can add hierarchical components to model individual heterogeneity in the survival functions using the so paradigm of "frailty" models.
Would be good to get some feedback on the content of this MR at this point. Any interest: @ricardoV94 , @OriolAbril or @AustinRochford ?
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:29Z ----------------------------------------------------------------
First paragraph: Does a good job of conveying the generality of the approach, but can we make this (particularly the part about state transitions) a bit more concrete? For example, we're just talking about a 2 discrete states state-space, right? Perhaps with a list of the broad range of situations/examples where survival models are appropriate.
Full stop needed at the end of the last sentence.
NathanielF commented on 2023-11-16T21:59:42Z ----------------------------------------------------------------
Done
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:30Z ----------------------------------------------------------------
Minor formatting issue (missing space) before the citation.
Minor grammatical issue "ensure efficiency through the maintenance of suitably staffed company"
Could potentially add links to censored notebooks, or to those with the "censored" tag.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:31Z ----------------------------------------------------------------
Line #1. retention_df = pd.read_csv("../data/time_to_attrition.csv")
Can you take a look at the style guide in terms of data loading. See here https://www.pymc.io/projects/docs/en/latest/contributing/jupyter_style.html#reading-from-file
Could optionally clarify what the drop_first kwarg does in terms of representing the data and what statistical problem it solves.
Adjusted!
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:32Z ----------------------------------------------------------------
Line #1. from lifelines import KaplanMeierFitter
Can you take a look at the style guide for extra dependencies https://www.pymc.io/projects/docs/en/latest/contributing/jupyter_style.html#extra-dependencies
Done
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:33Z ----------------------------------------------------------------
Line #21. fig, axs = plt.subplots(2, 1, figsize=(20, 13))
Suggest that this figure have more of a portrait than landscape aspect ratio, but do what you think best.
Changed this
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:34Z ----------------------------------------------------------------
Good - but maybe be more explicit? e.g. Survival seems to decrease as a sentiment decreases
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:35Z ----------------------------------------------------------------
minor grammatical issue "In the context of a failure modelling"
For the second equation, could clarify that the X's are different predictors?
Can you clarify a bit more what "predictor variables are set at their baseline/reference levels" means?
drbenvincent commented on 2023-11-28T15:08:57Z ----------------------------------------------------------------
"In the context of a failure modelling" -> "In the context of failure modelling"
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:36Z ----------------------------------------------------------------
Can you explain what these months are relative to? Zero corresponds to an employee starting at the company?
NathanielF commented on 2023-11-16T22:02:14Z ----------------------------------------------------------------
Done
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:37Z ----------------------------------------------------------------
Possibly very obvious, but you could add a quick legend to clarify that columns are months and rows are individuals
NathanielF commented on 2023-11-16T22:02:38Z ----------------------------------------------------------------
Adjusted and added note
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:38Z ----------------------------------------------------------------
Just to confirm? This is a binary matrix where 0 represents no risk because the employee has already quit? Could be worth elaborating slightly on that point
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:38Z ----------------------------------------------------------------
repeated word "our our model"
typo "ussually"
Stylistic, but in general I'd avoid this "CoxPH" shorthand and just write it out in full.
Second paragraph here seems really crucial for eh reader to understand. I'd be tempted to elaborate more on this, possibly adding a few lines of math?
NathanielF commented on 2023-11-16T22:03:35Z ----------------------------------------------------------------
Adjusted and clarified. Used pseudo math with formula syntax as it seemed clearest.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:40Z ----------------------------------------------------------------
Improve the graphviz further:
Add individuals vector to the coords
Add individuals * press coords to X_data
Add the intervals and individuals coords to the 3 nodes in the bottom plate
NathanielF commented on 2023-11-16T22:04:09Z ----------------------------------------------------------------
Added all coordination and dims
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:41Z ----------------------------------------------------------------
I'd be tempted to also add a compare plot https://python.arviz.org/en/stable/examples/plot_compare.html, but up to you
NathanielF commented on 2023-11-16T22:38:01Z ----------------------------------------------------------------
Added
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:42Z ----------------------------------------------------------------
This with the paragraph above is nice. I know it's encoded in the pymc model specification, but it could be useful for beginners to survival models to also have the model maths written out in LaTeX here.
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:43Z ----------------------------------------------------------------
Typo "occuptation"
"This assumption" refers to the multiplicative effect?
NathanielF commented on 2023-11-16T22:04:32Z ----------------------------------------------------------------
Clarified
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:43Z ----------------------------------------------------------------
Possibly add smooth=False
kwarg to plot_hdi. I'm not a fan of it, I find it often misleading
NathanielF commented on 2023-11-16T22:37:34Z ----------------------------------------------------------------
Adjusted
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:44Z ----------------------------------------------------------------
typo: occurence
NathanielF commented on 2023-11-16T22:36:38Z ----------------------------------------------------------------
Fixed
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:45Z ----------------------------------------------------------------
"jumpy nature of THE base hazard function"
NathanielF commented on 2023-11-16T22:06:20Z ----------------------------------------------------------------
Adjusted
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:46Z ----------------------------------------------------------------
What parameter estimate?
NathanielF commented on 2023-11-16T22:06:36Z ----------------------------------------------------------------
Clarified
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:47Z ----------------------------------------------------------------
Can we colour code the hdi to match the colour for the line?
I'd be tempted to make the fig_size smaller so that the text is proportionally larger. I'd say that as a general comment for many of the other plots as well.
NathanielF commented on 2023-11-16T22:07:00Z ----------------------------------------------------------------
Changed this. Good call!
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:47Z ----------------------------------------------------------------
Minor grammatical issue "but how we can use it to predict"
NathanielF commented on 2023-11-16T22:07:15Z ----------------------------------------------------------------
Changed
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:48Z ----------------------------------------------------------------
Typo: occurence
NathanielF commented on 2023-11-16T22:36:50Z ----------------------------------------------------------------
Fixed
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:49Z ----------------------------------------------------------------
You could be able to use \Big( or \BIG( to make the brackets in the first equation more kosher
Second equation. Should this start with this? \log(T_i) =
"c is an indicator function for whether the observation is censored" meaning that it's 1 for individuals still alive?
NathanielF commented on 2023-11-16T22:07:47Z ----------------------------------------------------------------
Adjusted and clarified
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:50Z ----------------------------------------------------------------
In an ideal world we'd be able to use inline expressions. Can we do that? See https://mystmd.org/guide/quickstart-jupyter-lab-myst#inline-expressions
NathanielF commented on 2023-11-16T21:17:55Z ----------------------------------------------------------------
Tried this. Doesn't seem to work
NathanielF commented on 2023-11-16T22:39:57Z ----------------------------------------------------------------
Replaced with print statement
drbenvincent commented on 2023-11-28T15:09:57Z ----------------------------------------------------------------
Thanks
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:51Z ----------------------------------------------------------------
Ideally add in all the coords and dims
NathanielF commented on 2023-11-16T22:08:05Z ----------------------------------------------------------------
Added all coords
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:52Z ----------------------------------------------------------------
same comment about smooth=False for the plot_hdi
NathanielF commented on 2023-11-16T22:37:09Z ----------------------------------------------------------------
Adjusted
View / edit / reply to this conversation on ReviewNB
drbenvincent commented on 2023-11-15T12:51:53Z ----------------------------------------------------------------
Excellent notebook. Very thorough. Pretty much a book chapter!
NathanielF commented on 2023-11-16T22:39:33Z ----------------------------------------------------------------
Indeed!
Frailty Models - Hierarchical Survival Models
Related to this issue: https://github.com/pymc-devs/pymc-examples/issues/579
Leaving as Draft for now.
Helpful links
:books: Documentation preview :books:: https://pymc-examples--580.org.readthedocs.build/en/580/