Closed therneau closed 4 months ago
I thought about this some more, and I now think updating a tdc() is impossible to do reliably. Assume that the tmerge data set has multiple tdc variables, and that these were not necessarily changed on the same dates; this is often true for my data sets. As a result we have four intervals (0,10), (10,20), (20,30) and (30,40) and tdc variable x with values of 1, 2, 2, 3. You now do an new tdc() to add an x value of 4 at time 15. The new data will have intervals of (0,10), (10,15), (15,20), (20,30) and (30,40). The problem is that you don't know, at this point, whether the x=2 value for time (20,30) was an actual measurement at time 20, or simply a carry over from a measurement at time 10. If the former, the new data should have values of 1, 2, 4, 2, 3 for the 5 intervals; if the latter 1,2,4,4,3 is correct. That is, you don't know the duration of the new value. The rule in tmerge is currently that tdc replaces, it does not attempt to augment.
I've thought about this some more, and now believe that there is no valid was to update a tdc() variable. Here is the problem.
To do this the routine would either need a full transactional history, or would need to mark each individual tdc variable, at each time, as 'real' or 'repeat'. This discussion needs to be added to the tdc vignette.
This issue was raised by Sam Buttery via email (which I normally prefer), but I'm moving it here in order to perhaps get some comments and opinions on the best answer. Starting with his email
First, thank you (and your colleagues) for the very well done survival package. I have an observation that might demonstrate a bug and a simple example that, I hope, shows the different ways a particular task is treated in versions 3.2-7, 3.2-11, and 3.3-1.
In my example I’d like to use tmerge() to create a “long” data set from these observations: (df <- data.frame(id = 1:3, x = c(100,100,100), thing = c(1.2,1.2,1.2), time = c(0,0,0), thing2= c(2.1,NA,2.3), time2 = c(2,NA,2), thing3 = c(NA,NA,3.5), time3 = c(NA,NA,3), end = c(4,4,4))) id x thing time thing2 time2 thing3 time3 end 1 1 100 1.2 0 2.1 2 NA NA 4 2 2 100 1.2 0 NA NA NA NA 4 3 3 100 1.2 0 2.3 2 3.5 3 4
I use these commands:
( newdf <- tmerge(df[,1:2], df, id = id, tstop = end)) ( newdf <- tmerge(newdf, df, id = id, thing = tdc( time, thing) ) ) ( newdf <- tmerge(newdf, df, id = id, thing = tdc( time2, thing2) ) ) # see below ( newdf <- tmerge(newdf, df, id = id, thing = tdc( time3, thing3) ) )
Under the current version, this second-to-last line produces an error that says: Error in tmerge(newdf, df, id = id, thing = tdc(time2, thing2)) : tdc update does not match prior variable type: thing # from version 3.3-1
Now, at your github site, https://rdrr.io/github/therneau/survival/src/R/tmerge.R , the code for tmerge() includes these lines: "# Next line removed 11/2021, I can't recall why was added
if (!is.null(yinc)) newvar <- NULL #a tdc can't be updated, other than 0/1"
If I reinstate that if() (I call the updated version tmergx() ), I can run that second to last line, producing this: ( newdf <- tmergex(newdf, df, id = id, thing = tdc( time2, thing2) ) )# 3.2-11 does this id x tstart tstop thing 1 1 100 0 2 NA 2 1 100 2 4 2.1 3 2 100 0 4 NA 4 3 100 0 2 NA 5 3 100 2 4 2.3
So at least it works, but it’s not what I expect. I expect “thing” for the first row to be 1.2, for example. Survival 3.2-7, that same command, using the tmerge from that version, produces this output, which I think is what I want. Do you agree?
( newdf <- tmerge(newdf, df, id = id, thing = tdc( time2, thing2) ) )# v. 3.2-7. id x tstart tstop thing 1 1 100 0 2 1.2 2 1 100 2 4 2.1 3 2 100 0 4 1.2 4 3 100 0 2 1.2 5 3 100 2 4 2.3
I know you’re busy, but I would love it if you could look into this. I hope this question isn’t a trivial on Thanks, Sam Buttrey
I wrote the tmerge command under the assumption that each new tdc(), cumtdc(), even(), cumevent() would create a new variable. The notion that they could update the same variable has come up a few times, and always leads me down a rabbit hole. Here are some of the issues. I'll use "zed" as the name of the target variable of interest.
a. fold the new values into the old. A major problem is what to do if the data type of the addition != data type of the old. I simply have not had the energy to tackle this, since there are so many special cases. Both 0/1 is easy. What about 0/1 + logical, numeric + factor, two factors with different level sets (okay if new levels are a superset of old?). Dates are painful since ifelse() loses the class, coding for them has been like walking on eggshells. Perhaps an aggressive "sorry, can't do that" error message for unanticipated combinations --- anything you can think of will eventually come up. b. What about tied times. New one wins? What if the new value is NA and the old is not?
c. How should a cumtdc be updated? d. Same issues for event and cumevent()
All in all, it's enough coding work that my response, multiple times, has been "I'll paper it over for now and come back later to think it through". But the 'come back later' never seems to materialize.
So. What do others think that the routine should do? For the data sets I work with the "update a tdc" use case arises very rarely, so I don't have strong priors opinions. And, once comments are in, any volunteers to give the updates a try?
Any change, or even the status quo, deserves a nice long addition to the test directory, checking out a large number of combinations, with comments on what we expect. Then also add this discussion to the vignette and help files.
Churchill said “Democracy is the worst form of government, except for all those other forms that have been tried from time to time.” There are many times I feel exactly that way about tmerge -- using it drives me batty sometimes. But I haven't yet found the better solution to building these data sets.
Terry T.