Frailty survival model - Githubissues

NathanielF commented 11 months ago

In the case of the CoxPH regression the "Poisson trick" is a classic of the literature and it works to give me the results i was expecting. It's also consistent with the approach already documented in Austin's notebook.

More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential. And again in the case of the Weibull AFT my current parameterisation gives the expected results.

But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored liklihoods for survival?

View entire conversation on ReviewNB

ricardoV94 commented 11 months ago

But maybe i'm missing something, was there a pattern of usage you had in mind w.r.t. to using censored likelihoods for survival?

From a first glance it sounded like you just wrote your own Censored likelihood by hand. Since we implemented it in PyMC we've been moving examples towards using the Censored factory instead because that fits much more with the PyMC vibe. Similarly, nobody is writing Potentials for Mixture likelihoods either, because we have pm.Mixture.

In the case of the CoxPH regression the "Poisson trick" is a classic of the literature

I am not familiar with the Poisson trick. Google didn't elucidate it quickly for me.

ricardoV94 commented 11 months ago

More generally in the case of the AFT models below. I tried using the pm.Censored with the Weibull regression, but it gave me garbage results and was allot slower than using the Potential.

I was talking about this one yes, not the Poisson trick. Sounds like a bug or a difference in the parametrization of the PyMC-defined Weibull and what you're comparing against. Using pm.Censored could be slower but shouldn't give different results. Logp wise it should be equivalent to what you did with the Potential.

NathanielF commented 11 months ago

Can you show me an example of how to use the censored distribution for AFT models. This is what i tried....

NathanielF commented 11 months ago

I am not familiar with the Poisson trick. Google didn't elucidate it quickly for me.

See maybe here: https://cran.r-project.org/web/packages/survival/vignettes/approximate.pdf Also, in a textbook i have at home...

ricardoV94 commented 11 months ago

In that example you didn't pass observed.

When you have mixed censored and not censored you have to pass an array to upper with +inf (or anything above y) for the uncensored obs and y for the censored ones. Something like

pm.Censored("obs", dist, lower=None, upper=np.where(cens, y, np.inf), observed=y)

NathanielF commented 11 months ago

Ah... i did miss observed... but now i just get a crash out:

Similar crashes for different upper bounds

ricardoV94 commented 11 months ago

Maybe try upper=np.where(cens, y, y+1). The last value has to be larger than the observations when they are not censored. np.inf could be leading to numerical issues in the logcdf

NathanielF commented 11 months ago

So that does fit:

But gives predictions out of line with Lifelines and my prior Weibull fit. Top table here is the new fit predictions. Bottom is the lifelines predictions and my Potential based fit agreed with Lifelines here

ricardoV94 commented 11 months ago

Your upper is still weird. Do you have a constant censoring at y=20? If that's the case you should just be able to set upper=20

upper is the point of censoring. You are not allowed to observe any value beyond upper. If it matches exactly with upper it means that observation was censored. If it is below, it was not censored. upper=np.where(cens, y+1, 20) is odd. It say for the cases where cens is True, the censoring point is y+1 (which means it won't be treated as censored in the logp, because it's always higher than the observed value), otherwise it's 20.

NathanielF commented 11 months ago

Ugh, sorry. Don't know where my head is this morning!

The max y in the data set is 12. But it crashes out under this parameterisation:

ricardoV94 commented 11 months ago

Just as a sanity check do you get a -inf logp at the starting point if you just use a vanilla Weibull (without censoring). I wonder if the PyMC parametrization is just different than what you were working with.

NathanielF commented 11 months ago

You mean, just run the observed like this:

Seems to run fine

ricardoV94 commented 11 months ago

No, run it with all observations, censored and non-censored as if they were all uncensored

NathanielF commented 11 months ago

Seems to run just fine with all observations:

ricardoV94 commented 11 months ago

Interesting, that suggests a precision issue with the CDF (or a bug in the CDF). Can you open a GitHub issue with a minimal reproducible example?

NathanielF commented 11 months ago

@ricardoV94 added the issue here: https://github.com/pymc-devs/pytensor/issues/471

NathanielF commented 11 months ago

Marking this as ready for review. I think i solved the issue translating between the Logistic model fit and the log-logistic survival curve i wanted to estimate. Still have a few additional touches, such as adding the references and tidying a conclusion but i thought i'd open it up for review now.

As stated in the issue the broad idea of this MR is to demonstrate with a more detailed example how to actually build and use the survival regression models in PyMC. In particular i wanted to show how we can add hierarchical components to model individual heterogeneity in the survival functions using the so paradigm of "frailty" models.

NathanielF commented 11 months ago

Would be good to get some feedback on the content of this MR at this point. Any interest: @ricardoV94 , @OriolAbril or @AustinRochford ?

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:29Z ----------------------------------------------------------------

First paragraph: Does a good job of conveying the generality of the approach, but can we make this (particularly the part about state transitions) a bit more concrete? For example, we're just talking about a 2 discrete states state-space, right? Perhaps with a list of the broad range of situations/examples where survival models are appropriate.

Full stop needed at the end of the last sentence.

NathanielF commented on 2023-11-16T21:59:42Z ----------------------------------------------------------------

Done

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:30Z ----------------------------------------------------------------

Minor formatting issue (missing space) before the citation.

Minor grammatical issue "ensure efficiency through the maintenance of suitably staffed company"

Could potentially add links to censored notebooks, or to those with the "censored" tag.

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:31Z ----------------------------------------------------------------

Line #1.    retention_df = pd.read_csv("../data/time_to_attrition.csv")

Can you take a look at the style guide in terms of data loading. See here https://www.pymc.io/projects/docs/en/latest/contributing/jupyter_style.html#reading-from-file

Could optionally clarify what the drop_first kwarg does in terms of representing the data and what statistical problem it solves.

_NathanielF commented on 2023-11-16T22:00:02Z_ ----------------------------------------------------------------

Adjusted!

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:32Z ----------------------------------------------------------------

Line #1.    from lifelines import KaplanMeierFitter

Can you take a look at the style guide for extra dependencies https://www.pymc.io/projects/docs/en/latest/contributing/jupyter_style.html#extra-dependencies

_NathanielF commented on 2023-11-16T22:01:06Z_ ----------------------------------------------------------------

Done

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:33Z ----------------------------------------------------------------

Line #21.    fig, axs = plt.subplots(2, 1, figsize=(20, 13))

Suggest that this figure have more of a portrait than landscape aspect ratio, but do what you think best.

_NathanielF commented on 2023-11-16T22:01:23Z_ ----------------------------------------------------------------

Changed this

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:34Z ----------------------------------------------------------------

Good - but maybe be more explicit? e.g. Survival seems to decrease as a sentiment decreases

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:35Z ----------------------------------------------------------------

minor grammatical issue "In the context of a failure modelling"

For the second equation, could clarify that the X's are different predictors?

Can you clarify a bit more what "predictor variables are set at their baseline/reference levels" means?

drbenvincent commented on 2023-11-28T15:08:57Z ----------------------------------------------------------------

"In the context of a failure modelling" -> "In the context of failure modelling"

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:36Z ----------------------------------------------------------------

Can you explain what these months are relative to? Zero corresponds to an employee starting at the company?

NathanielF commented on 2023-11-16T22:02:14Z ----------------------------------------------------------------

Done

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:37Z ----------------------------------------------------------------

Possibly very obvious, but you could add a quick legend to clarify that columns are months and rows are individuals

NathanielF commented on 2023-11-16T22:02:38Z ----------------------------------------------------------------

Adjusted and added note

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:38Z ----------------------------------------------------------------

Just to confirm? This is a binary matrix where 0 represents no risk because the employee has already quit? Could be worth elaborating slightly on that point

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:38Z ----------------------------------------------------------------

repeated word "our our model"

typo "ussually"

Stylistic, but in general I'd avoid this "CoxPH" shorthand and just write it out in full.

Second paragraph here seems really crucial for eh reader to understand. I'd be tempted to elaborate more on this, possibly adding a few lines of math?

NathanielF commented on 2023-11-16T22:03:35Z ----------------------------------------------------------------

Adjusted and clarified. Used pseudo math with formula syntax as it seemed clearest.

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:40Z ----------------------------------------------------------------

Improve the graphviz further:

Add individuals vector to the coords

Add individuals * press coords to X_data

Add the intervals and individuals coords to the 3 nodes in the bottom plate

NathanielF commented on 2023-11-16T22:04:09Z ----------------------------------------------------------------

Added all coordination and dims

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:41Z ----------------------------------------------------------------

I'd be tempted to also add a compare plot https://python.arviz.org/en/stable/examples/plot_compare.html, but up to you

NathanielF commented on 2023-11-16T22:38:01Z ----------------------------------------------------------------

Added

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:42Z ----------------------------------------------------------------

This with the paragraph above is nice. I know it's encoded in the pymc model specification, but it could be useful for beginners to survival models to also have the model maths written out in LaTeX here.

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:43Z ----------------------------------------------------------------

Typo "occuptation"

"This assumption" refers to the multiplicative effect?

NathanielF commented on 2023-11-16T22:04:32Z ----------------------------------------------------------------

Clarified

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:43Z ----------------------------------------------------------------

Possibly add smooth=False kwarg to plot_hdi. I'm not a fan of it, I find it often misleading

NathanielF commented on 2023-11-16T22:37:34Z ----------------------------------------------------------------

Adjusted

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:44Z ----------------------------------------------------------------

typo: occurence

NathanielF commented on 2023-11-16T22:36:38Z ----------------------------------------------------------------

Fixed

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:45Z ----------------------------------------------------------------

"jumpy nature of THE base hazard function"

NathanielF commented on 2023-11-16T22:06:20Z ----------------------------------------------------------------

Adjusted

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:46Z ----------------------------------------------------------------

What parameter estimate?

NathanielF commented on 2023-11-16T22:06:36Z ----------------------------------------------------------------

Clarified

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:47Z ----------------------------------------------------------------

Can we colour code the hdi to match the colour for the line?

I'd be tempted to make the fig_size smaller so that the text is proportionally larger. I'd say that as a general comment for many of the other plots as well.

NathanielF commented on 2023-11-16T22:07:00Z ----------------------------------------------------------------

Changed this. Good call!

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:47Z ----------------------------------------------------------------

Minor grammatical issue "but how we can use it to predict"

NathanielF commented on 2023-11-16T22:07:15Z ----------------------------------------------------------------

Changed

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:48Z ----------------------------------------------------------------

Typo: occurence

NathanielF commented on 2023-11-16T22:36:50Z ----------------------------------------------------------------

Fixed

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:49Z ----------------------------------------------------------------

You could be able to use \Big( or \BIG( to make the brackets in the first equation more kosher

Second equation. Should this start with this? \log(T_i) =

"c is an indicator function for whether the observation is censored" meaning that it's 1 for individuals still alive?

NathanielF commented on 2023-11-16T22:07:47Z ----------------------------------------------------------------

Adjusted and clarified

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:50Z ----------------------------------------------------------------

In an ideal world we'd be able to use inline expressions. Can we do that? See https://mystmd.org/guide/quickstart-jupyter-lab-myst#inline-expressions

NathanielF commented on 2023-11-16T21:17:55Z ----------------------------------------------------------------

Tried this. Doesn't seem to work

NathanielF commented on 2023-11-16T22:39:57Z ----------------------------------------------------------------

Replaced with print statement

drbenvincent commented on 2023-11-28T15:09:57Z ----------------------------------------------------------------

Thanks

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:51Z ----------------------------------------------------------------

Ideally add in all the coords and dims

NathanielF commented on 2023-11-16T22:08:05Z ----------------------------------------------------------------

Added all coords

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:52Z ----------------------------------------------------------------

same comment about smooth=False for the plot_hdi

NathanielF commented on 2023-11-16T22:37:09Z ----------------------------------------------------------------

Adjusted

review-notebook-app[bot] commented 10 months ago

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2023-11-15T12:51:53Z ----------------------------------------------------------------

Excellent notebook. Very thorough. Pretty much a book chapter!

NathanielF commented on 2023-11-16T22:39:33Z ----------------------------------------------------------------

Indeed!

NathanielF commented 10 months ago

Tried this. Doesn't seem to work

View entire conversation on ReviewNB

NathanielF commented 10 months ago

Done

View entire conversation on ReviewNB

pymc-devs / pymc-examples

Frailty survival model #580

Frailty Models - Hierarchical Survival Models

Helpful links