openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
708 stars 37 forks source link

[REVIEW]: surtvep: An R package for estimating time-varying effects #5688

Closed editorialbot closed 2 months ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@LingfengLuo0510<!--end-author-handle-- (Lingfeng Luo) Repository: https://github.com/UM-KevinHe/surtvep Branch with paper.md (empty if default branch): JOSS Version: v1.0.0 Editor: !--editor-->@osorensen<!--end-editor-- Reviewers: @adibender, @turgeonmaxime Archive: 10.5281/zenodo.12575049

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/cfccdf9c4b2e69546eafb629fc48dacb"><img src="https://joss.theoj.org/papers/cfccdf9c4b2e69546eafb629fc48dacb/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/cfccdf9c4b2e69546eafb629fc48dacb/status.svg)](https://joss.theoj.org/papers/cfccdf9c4b2e69546eafb629fc48dacb)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@adibender & @turgeonmaxime, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @osorensen know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @turgeonmaxime

📝 Checklist for @adibender

adibender commented 5 months ago

@LingfengLuo0510 thanks for the updates. Here are some additional notes. There are still some points I haven't checked, because of the issues below, mostly concerning the paper and documentation. Those concerned points are:

The main improvement for the paper I think would be to make the the scope and potential limitations of the package more clear:

To be clear, I don't ask to run some addtional models or change the implementation, but scope, limitations, overview of existing methods and performance claims should be adapted/verified in the paper. @osorensen has to decide regarding the reproducibility aspect of the real data example.

Some minor comments

Edit

Here is the code I mentioned: Note, the Aalen model produces cumulative (time-varying) coefficients, but these are equivalent to f(t)*x type hazards models as shown in Figure 5 here: https://arxiv.org/pdf/1806.01042.pdf?

# This is the example from the paper
library(surtvep)
library(ggplot2)
library(timereg)
library(pammtools)

data("ExampleData")
# data prep
z <- ExampleData$z
time <- ExampleData$time
event <- ExampleData$event
# non-penalized model
fit.tv <- coxtv(z = z, event = event, time = time)
# penalized model
fit.penalize <- coxtp(z = z, event = event, time = time)
fit.ic <- IC(fit.penalize)

# Visualization
plot(fit.tv, ylim = c(-3,10), parm = "X1") + geom_hline(yintercept = 1, col = 2)
plot(fit.ic$model.mAIC, ylim = c(-3,10), par = "X2")

# illustration with PAMMs

# data prep
df <- data.frame(time = time, event = event)
df <- cbind.data.frame(df, z)
colnames(df) <- c("time", "event", "z1", "z2")
# data trafo
ped <- as_ped(Surv(time, event) ~ ., cut = seq(0, 3, length.out = 100), data = df)
# model fit
pam <- pamm(ped_status ~ s(tend) +  s(tend, by = z1) + s(tend, by = z2),
  data = ped, engine = "bam", method = "fREML", discrete = TRUE)
summary(pam, 1)

# recovers true effects
layout(matrix(1:2, nrow = 1))
plot(pam, select = 2)
abline(h = 1, lty = 2, col = 2)
plot(pam, select = 3)
curve(sin(3 *pi * x / 4), 0, 3, add = TRUE, col = 2, lty = 2)

# Runtime comparisons
# -surtvep
# - pamm
# - aalen
t_surtvep <- system.time({
  fit.penalize <- coxtp(z = z, event = event, time = time)
  fit.ic <- IC(fit.penalize)
})

t_pam_large <- system.time({
  ped <- as_ped(Surv(time, event) ~ ., data = df)
  pam <- pamm(
    ped_status ~ s(tend) +  s(tend, by = z1) + s(tend, by = z2),
    data = ped, engine = "bam", method = "fREML", discrete = TRUE)
})

t_pam_small <- system.time({
  ped <- as_ped(Surv(time, event) ~ ., cut = seq(0, 3, length.out = 100), data = df)
  pam <- pamm(
    ped_status ~ s(tend) +  s(tend, by = z1) + s(tend, by = z2),
    data = ped, engine = "bam", method = "fREML", discrete = TRUE)
})

t_aalen <- system.time({
  aal <- aalen(Surv(time, event) ~ z1 + z2, data = df, n.sim = 100)
})

rbind(t_surtvep, t_pam_large, t_pam_small, t_aalen)
t_pam_large[3]/t_surtvep[3]
t_surtvep[3]/t_pam_small[3]
osorensen commented 5 months ago

Thank you @adibender. I really appreciate your efforts reviewing this package. @LingfengLuo0510, please address all the points raised by @adibender, and reach out here if you have any further questions.

editorialbot commented 5 months ago

:wave: @adibender, please update us on how your review is going (this is an automated reminder).

osorensen commented 4 months ago

@LingfengLuo0510, could you please let us know if you intend to address the remaining issues pointed out be @adibender, and give us an approximate timeline?

LingfengLuo0510 commented 4 months ago

@osorensen We are working to address these issues. We anticipate an update within the next 10 days. Thank you!

editorialbot commented 4 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 4 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

LingfengLuo0510 commented 4 months ago

@adibender

  1. you mention \beta(t) in the paper, but I think the types of models that can be fit should be made more clear, i.e. varying coefficient models of the form f(t)x, i.e. (potentially) non-linear in time, but linear in x, in contrast for example to f(x)t or f(t,x).

Thank you for your suggestion. We have added a sentence at line 70 clarifying the type of models considered in this work.

  1. Implementation wise, users can not choose which type of effect to fit, i.e. one can only specify covariates and they will automatically be assumed to have f(t)x, but one cannot chose some x to be beta x or interactions between different Xs, non-linear effects f(x), etc., which is fine, but should be briefly made explicit

We have added a sentence at line 129 clarifying that the limitation of current implementation.

  1. Line 44: "compared to existing computational packages..." Here are some performance claims, but the paper does not explicitly mention which packages were compared and I couldn't find the comparisons also in the referenced papers (see also my attached R code where I compare surtvep to 2 approaches, my own PAMMs and Aalens additive hazards model from timereg package, where for the simple case they appear to be faster)

Thank you for bringing this up and for sharing your comparison code. We've clarified at line 45 that our performance comparisons were specifically against other packages for Cox proportional hazards models.

  1. It's also not clear from the paper in which situations the package applies, n>p or n<<p?

We have added relevant discussion at line 130 clarifying that the current implementation is for low-dimensional settings (where the number of covariates is much smaller than the sample size).

  1. The Data Example is not reproducible as it only links to the Homepage, but not clear which data exactly was downloaded, how it was preprocessed, etc.

The code for model fitting and plotting can be found in the Repository-joss branch. The name of the file is "GenerateRealDataPlot.R". Access to data can be requested at https://seer.cancer.gov/data/access.html. We have added a sentence at line 139 clarifying the data request. We also include the public available SUPPORT data example in the tutorial website.

  1. It's not clear for which survival tasks it is applicable, only to right-censored data or also left-/interval censored, competing risks, multi-state models, etc.

We have added a sentence at line 131 clarifying that the limitation of current implementation.

  1. L59: "Breslow ... number of times" probably should be "number of ties"?

Thank you for pointing it out. We changed the “times” to “ties” at Line 61.

  1. L95: mentions some hypothesis testing capabilities, etc. (add references to the papers where these have been established).

We have added the references giving the details of testing at line 129.

  1. L104: Link in 104 leads to main page, therein is a section to "Detailed tutorial" but link goes back to landing page

Thank you for pointing it out. We have updated the link.

  1. for the figure here: https://um-kevinhe.github.io/surtvep/articles/surtvep.html#quick-start better to use coord_cartesian(ylim = c(y1,y2)) rather than ylim(c(y1,y2))

Thank you for pointing it out. We will update this plot function in future releases of R-CRAN.

LingfengLuo0510 commented 3 months ago

@adibender @osorensen May I ask if I have addressed the issues? Thanks a lot!

osorensen commented 3 months ago

Thanks for asking @LingfengLuo0510. @adibender, could you please have a look?

osorensen commented 3 months ago

👋 @adibender, could you please have a look at the last revisions made by @LingfengLuo0510, described in this post?

adibender commented 3 months ago

Hi yes, sorry, will do this week

adibender commented 3 months ago

@osorensen @LingfengLuo0510 my comments have been addressed sufficiently. Just one more minor thing: In Line 37: "With the rising need for modeling time-varying effects, researchers have developed methods to handle the data (Gray, 1992, 1994; Hastie & Tibshirani, 1993; Zucker & Karr, 1990)." Rather than "the data" it should say "such data"? And even that would be a bit imprecise, because the data is standard, its the effects that change, so maybe reformulate a bit. Other than that, good to go and I don't need to see another revision.

Sorry again for the delay.

osorensen commented 3 months ago

Thanks a lot @adibender, both for your quick response and for your very good review. It is much appreciated!

@adibender, could you please also update your checklist.

osorensen commented 3 months ago

@LingfengLuo0510, can you please fix the last suggestion by @adibender, and notify me here when done?

LingfengLuo0510 commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

LingfengLuo0510 commented 3 months ago

@adibender Thank you for your suggestion. We have modified the sentence at line 37 to make it more clear. The modified sentence is "With the rising need for modeling time-varying effects, researchers have developed methods to handle the complex and dynamic nature of such data."

@osorensen I have updated the above sentence. Thanks a lot!

osorensen commented 3 months ago

@editorialbot check references

osorensen commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

Checking the BibTeX entries failed with the following error:

Failed to parse BibTeX on value "title" (NAME) [#<BibTeX::Bibliography data=[8]>, "@", #<BibTeX::Entry >, "%"]
editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

turgeonmaxime commented 3 months ago

Checking the BibTeX entries failed with the following error:

Failed to parse BibTeX on value "title" (NAME) [#<BibTeX::Bibliography data=[8]>, "@", #<BibTeX::Entry >, "%"]

@LingfengLuo0510 I think the bib entry that leads to a parsing issue is this one here: https://github.com/UM-KevinHe/surtvep/blob/0ae36ca6d7a58aaaeb7902378438ea58611653ba/JOSS/paper.bib#L99-L107

Markup for comments in bibtex files is not standardized as far as I know. If you replace @article by %article, or simply remove the @ sign, the bot should be able to compile your manuscript.

osorensen commented 3 months ago

Thanks @turgeonmaxime. Great if you could fix this @LingfengLuo0510

osorensen commented 3 months ago

@LingfengLuo0510, I have now read through the manuscript once again, and think it's very well written. Below are some minor issues. Could you please address them, in addition to the BibTeX issue mentioned above, and report here when done?

LingfengLuo0510 commented 3 months ago

@editorialbot check references

LingfengLuo0510 commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- None

MISSING DOIs

- No DOI given, and none found for title: Fidgit: An ungodly union of GitHub and Figshare
- 10.1038/bjc.2015.174 may be a valid DOI for title: Time-varying effect and long-term survival analysi...
- 10.1002/cncr.33174 may be a valid DOI for title: Time-varying survival effects for squamous cell ca...
- 10.1214/aos/1176347503 may be a valid DOI for title: Nonparametric survival analysis with time-dependen...
- No DOI given, and none found for title: Varying-coefficient models
- 10.2307/2290630 may be a valid DOI for title: Flexible methods for analyzing survival data using...
- No DOI given, and none found for title: Using time dependent covariates and time dependent...
- No DOI given, and none found for title: A Package for Survival Analysis in R
- 10.1007/s10985-021-09544-2 may be a valid DOI for title: Scalable proximal methods for cause-specific hazar...
- No DOI given, and none found for title: Generalized Additive Models: An Introduction with ...
- 10.1007/s11222-016-9666-x may be a valid DOI for title: P-splines with derivative based penalties and tens...
- No DOI given, and none found for title: Surveillance, Epidemiology, and End Results (SEER)...
- 10.1007/978-1-4612-0919-5_38 may be a valid DOI for title: Information theory and an extension of the maximum...
- 10.1214/ss/1038425655 may be a valid DOI for title: Flexible smoothing with B-splines and penalties
- No DOI given, and none found for title: Distribution of an information statistic and the c...
- 10.1016/j.cmpb.2005.11.006 may be a valid DOI for title: A fast routine for fitting Cox models with time va...
- 10.1177/09622802231181471 may be a valid DOI for title: Using information criteria to Select Smoothing Par...
- 10.2307/2532779 may be a valid DOI for title: Spline-based tests in survival analysis
- 10.2307/2529620 may be a valid DOI for title: Covariance analysis of censored survival data
- 10.1080/10618600.2016.1237364 may be a valid DOI for title: Modeling time-varying effects with large-scale sur...
- 10.1111/biom.13473 may be a valid DOI for title: Stratified Cox models with time-varying effects fo...

INVALID DOIs

- None
editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

LingfengLuo0510 commented 3 months ago

@editorialbot check references

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1051/0004-6361/201322068 is OK
- 10.1051/0004-6361/201322068 is OK
- 10.1002/cncr.33174 is OK
- 10.1214/aos/1176347503 is OK
- 10.1111/j.2517-6161.1993.tb01939.x is OK
- 10.2307/2290630 is OK
- 10.1007/s10985-021-09544-2 is OK
- 10.1007/s11222-016-9666-x is OK
- 10.1007/978-1-4612-0919-5_38 is OK
- 10.1214/ss/1038425655 is OK
- 10.1016/j.cmpb.2005.11.006 is OK
- 10.1177/09622802231181471 is OK
- 10.2307/2532779 is OK
- 10.2307/2529620 is OK
- 10.1080/10618600.2016.1237364 is OK
- 10.1111/biom.13473 is OK

MISSING DOIs

- No DOI given, and none found for title: Fidgit: An ungodly union of GitHub and Figshare
- No DOI given, and none found for title: Using time dependent covariates and time dependent...
- No DOI given, and none found for title: A Package for Survival Analysis in R
- No DOI given, and none found for title: Generalized Additive Models: An Introduction with ...
- No DOI given, and none found for title: Surveillance, Epidemiology, and End Results (SEER)...
- No DOI given, and none found for title: Distribution of an information statistic and the c...

INVALID DOIs

- None
LingfengLuo0510 commented 3 months ago

@editorialbot check references

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1051/0004-6361/201322068 is OK
- 10.1002/cncr.33174 is OK
- 10.1214/aos/1176347503 is OK
- 10.1111/j.2517-6161.1993.tb01939.x is OK
- 10.2307/2290630 is OK
- 10.1007/s10985-021-09544-2 is OK
- 10.1007/s11222-016-9666-x is OK
- 10.1007/978-1-4612-0919-5_38 is OK
- 10.1214/ss/1038425655 is OK
- 10.1016/j.cmpb.2005.11.006 is OK
- 10.1177/09622802231181471 is OK
- 10.2307/2532779 is OK
- 10.2307/2529620 is OK
- 10.1080/10618600.2016.1237364 is OK
- 10.1111/biom.13473 is OK

MISSING DOIs

- No DOI given, and none found for title: Using time dependent covariates and time dependent...
- No DOI given, and none found for title: A Package for Survival Analysis in R
- No DOI given, and none found for title: Generalized Additive Models: An Introduction with ...
- No DOI given, and none found for title: Surveillance, Epidemiology, and End Results (SEER)...
- No DOI given, and none found for title: Distribution of an information statistic and the c...

INVALID DOIs

- None
LingfengLuo0510 commented 3 months ago

@osorensen

  1. "I cannot see that Figure 1 or Figure 2 are mentioned anywhere in the text. Please add one sentence pointing the reader to each of them. For example "Figure 1 shows a flow chart for functions in the surtvep package.""

We have added sentences directing the reader to Figure 1 (line 62) and Figure 2 (line 121) within the relevant sections of the text.

  1. "I suggest adding a reference to Eilers and Marx the first time you mention P-splines. Here is the original reference: https://doi.org/10.1214/ss/1038425655."

We have included the suggested reference to Eilers and Marx (1996) at the first mention of P-splines (line 18).

  1. "On line 103, rewrite "covariates effects" to "covariate effects".

We have corrected the phrasing to "covariate effects" (line 103).

We have updated the .bib file as best as possible. Some references, such as those to R packages, do not have DOIs as they are not traditional journal articles. Is this ok?

Thanks a lot!

osorensen commented 3 months ago

@editorialbot generate pdf

osorensen commented 3 months ago

@editorialbot check references

editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1051/0004-6361/201322068 is OK
- 10.1002/cncr.33174 is OK
- 10.1214/aos/1176347503 is OK
- 10.1111/j.2517-6161.1993.tb01939.x is OK
- 10.2307/2290630 is OK
- 10.1007/s10985-021-09544-2 is OK
- 10.1007/s11222-016-9666-x is OK
- 10.1007/978-1-4612-0919-5_38 is OK
- 10.1214/ss/1038425655 is OK
- 10.1016/j.cmpb.2005.11.006 is OK
- 10.1177/09622802231181471 is OK
- 10.2307/2532779 is OK
- 10.2307/2529620 is OK
- 10.1080/10618600.2016.1237364 is OK
- 10.1111/biom.13473 is OK

MISSING DOIs

- No DOI given, and none found for title: Using time dependent covariates and time dependent...
- No DOI given, and none found for title: A Package for Survival Analysis in R
- No DOI given, and none found for title: Generalized Additive Models: An Introduction with ...
- No DOI given, and none found for title: Surveillance, Epidemiology, and End Results (SEER)...
- No DOI given, and none found for title: Distribution of an information statistic and the c...

INVALID DOIs

- None
editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

osorensen commented 3 months ago

Thanks @LingfengLuo0510.

At this point could you:

I can then move forward with recommending acceptance of the submission.

LingfengLuo0510 commented 2 months ago

@osorensen

  1. Thanks a lot for the instructions, I have followed your steps to create this on Zenodo. The doi of the archived version is: 10.5281/zenodo.12575049 The URL is: https://doi.org/10.5281/zenodo.12575049

  2. I have listed the authors there too. However I didn't notice where to add the corresponding author. That is the same as the my pdf version.

Just let me know if everything looks good. Thanks again!

osorensen commented 2 months ago

@editorialbot set 10.5281/zenodo.12575049 as archive

editorialbot commented 2 months ago

Done! archive is now 10.5281/zenodo.12575049

osorensen commented 2 months ago

@editorialbot set v1.0.0 as version

editorialbot commented 2 months ago

Done! version is now v1.0.0

osorensen commented 2 months ago

@editorialbot recommend-accept

editorialbot commented 2 months ago
Attempting dry run of processing paper acceptance...
editorialbot commented 2 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1051/0004-6361/201322068 is OK
- 10.1002/cncr.33174 is OK
- 10.1214/aos/1176347503 is OK
- 10.1111/j.2517-6161.1993.tb01939.x is OK
- 10.2307/2290630 is OK
- 10.1007/s10985-021-09544-2 is OK
- 10.1007/s11222-016-9666-x is OK
- 10.1007/978-1-4612-0919-5_38 is OK
- 10.1214/ss/1038425655 is OK
- 10.1016/j.cmpb.2005.11.006 is OK
- 10.1177/09622802231181471 is OK
- 10.2307/2532779 is OK
- 10.2307/2529620 is OK
- 10.1080/10618600.2016.1237364 is OK
- 10.1111/biom.13473 is OK

MISSING DOIs

- No DOI given, and none found for title: Using time dependent covariates and time dependent...
- No DOI given, and none found for title: A Package for Survival Analysis in R
- No DOI given, and none found for title: Generalized Additive Models: An Introduction with ...
- No DOI given, and none found for title: Surveillance, Epidemiology, and End Results (SEER)...
- No DOI given, and none found for title: Distribution of an information statistic and the c...

INVALID DOIs

- None
editorialbot commented 2 months ago

:wave: @openjournals/dsais-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/5548, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

crvernon commented 2 months ago

🔍 checking out the following: