This is a re-submit of a previous PR #224. This adds basic support for plotting survival according to values of a string (categorical) variable, using plot_survival. See related issue #217.
As before, if you use a float or boolean value, the survival curves are plotted for True/False groups, or above/below a threshold (which defaults to a median value).
This PR adds support to be able to plot by a string variable -- after casting that string to a category.
As with the existing behavior with float/boolean values, if two groups are provided, the survival curves are plotted for the two groups & a log-rank test result is reported.
However, if the value contains data for more than 2 groups, a CoxPH model is fit & more than two survival curves are plotted:
There are other scenarios in #217 which are not covered here (ie passing in more than one threshold value, etc), and I can imagine more extensions being useful (e.g. fitting a cox-ph model to a float value but plotting by the median). But for now this covers the majority of use cases.
Coverage decreased (-1.01%) to 55.933% when pulling e974a9b47bf02539fe749977af0fec26b0c3f17c on feature-plot-by-category-2 into 31545c4b03d9a1edd9cc543ac1f863a140e88cd8 on master.
Coverage decreased (-1.06%) to 55.888% when pulling 30fcfb5fcb01973a49c97594ae07d882a1bbff16 on feature-plot-by-category-2 into 31545c4b03d9a1edd9cc543ac1f863a140e88cd8 on master.
Coverage decreased (-1.3%) to 55.666% when pulling 8f77dc6f751e29de220f9a046376a43181bc85f0 on feature-plot-by-category-2 into 31545c4b03d9a1edd9cc543ac1f863a140e88cd8 on master.
This is a re-submit of a previous PR #224. This adds basic support for plotting survival according to values of a string (categorical) variable, using
plot_survival
. See related issue #217.As before, if you use a float or boolean value, the survival curves are plotted for True/False groups, or above/below a threshold (which defaults to a median value).
This PR adds support to be able to plot by a string variable -- after casting that string to a category.
As with the existing behavior with float/boolean values, if two groups are provided, the survival curves are plotted for the two groups & a log-rank test result is reported.
However, if the value contains data for more than 2 groups, a CoxPH model is fit & more than two survival curves are plotted:
There are other scenarios in #217 which are not covered here (ie passing in more than one threshold value, etc), and I can imagine more extensions being useful (e.g. fitting a cox-ph model to a float value but plotting by the median). But for now this covers the majority of use cases.