Closed KellenBrosnahan closed 3 years ago
Would you like to submit a PR to help fix it?
Hello @terrytangyuan,
Yes, I would like to do that. However, I'm unfamiliar with the process, and I'm having some trouble. I was able to clone the repo and create a local branch, but when I tried to push the branch with changes (git push origin autoplotSurvfitMultipleStratification
), I received the following error message:
ERROR: Permission to sinhrks/ggfortify.git denied to KellenBrosnahan.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Is this the way I'm supposed to be doing things, or am I completely lost? Thanks, Kellen
PS. The contributing guidelines mentioned unit tests, but I'm not sure what those are or how to do them. Could you explain them please?
Perhaps this might help: https://www.dataschool.io/how-to-contribute-on-github/
System information
Problem
When using
autoplot
on asurvfit
object with multiple stratifying variables, the function returns the following error:I think this is a bug because
?autoplot.survfit
does not list any restrictions on number of stratifying variables. See the Cause section for why I think it's a bug inggfortify
.Reprex
Cause
I've done some digging, and I believe that I've found the source of the bug. Here's my understanding.\ When
autoplot.survfit
callsfortify.survfit
, lines 33-34 (line numbers fromgetS3method("fortify", "survfit")
) read:The default naming of strata by the survfit.formula function for two stratification variables is
"var1=value1,var2=value2"
. Thisgsub
call replaces two substrings. First, it replaces the substring"var1="
with the empty string. Then, it replaces the substring"value1,var2="
with the empty string. This causes the entire stratum name to be reduced to"value2"
. For multiple stratification, this causes the stratum names to overlap, e.g."var1=0,var2=0"
and"var1=1,var2=0"
would both be reduced to"0"
. This leads to duplicate values for groupIDs, which means it is not a validlevels
argument tofactor
.Solution
If I understand the cause correctly, the issue is that the gsub takes the pattern
".*="
, which replaces any substring before the equals sign. A potential solution would be to change the pattern to not allow commas. I believe that the R regular expression syntax for that is"[^,]*=
. This would cause multiply stratified names to become comma-separated, e.g."var1=value1,var2=value2"
would reduce to"value1,value2"
.