som-shahlab / survlearners

Metalearners for Survival Data
https://som-shahlab.github.io/survlearners
13 stars 1 forks source link

Add basic tests to testthat\ #22

Open CrystalXuR opened 2 years ago

CrystalXuR commented 2 years ago

@erikcs I am done with cleaning and editing all the codes in the R folder, should I add and test them to the testthat\ before hand over to you and @nignatiadis? if so, what is the workflow for that process?

erikcs commented 2 years ago

Let's wait a bit before adding tests and:

1) Can you see last point in #6 and make sure to use survival:: + Imports: survival

2) I would want all the estimator files in R/ do having a runnable @example https://github.com/som-shahlab/survmetabench/blob/master/r-package/survlearners/R/surv_xl_lasso.R#L11

that means that when opening a new PR, R runs all the code examples and we know it's not broken

that means you need to go through all files and:

replace \dontrun with \donttest which will make R CMD check execute all those examples

Can you do this in separate PRs?

CrystalXuR commented 2 years ago

yes, I will do these two things in two separate PRs.

CrystalXuR commented 2 years ago

@erikcs Please let me know if you have other things want me to check, otherwise, I think you and @nignatiadis can start the review

erikcs commented 2 years ago

There is some more cleanup to do:

1) Remove MASS and mvtnorm from Imports if they are no longer being used for dgps, that closes #6

2) All the old R-learner files have signature (x, w, y), that is a recipe for bugs when GRF uses (Covariates, Outcome, Treatment). They should all be updated to use signature (X, Y, W) with uppercase letters for consistency. that closes #3

3) The rest of the surv_ estimators has signature (data, data.test), that is not the signature a user should face, it should be (X, Y, W, etc). Making convenience entry functions for simulations using (data, data.test) is something you can do outside the r-package in a separate file for simulations. CSF.R should not be part of the R/ package There can be an experiments\ folder in the main repo which stiches together and sets up the simulations (HPC.R) by doing library survlearners, see GRF for an example

4) The predict.* signatures should be predict(object, newdata = NULL): NOT "newx" (follow predict.lm which use newdata)

5) Aesthetics: a) All the files in R/ should have consistent signature. "variable.names" with punctuation and function_name with underscore. Everything besides these use ., cen_fit is not consistent. Arguments that are similar to grf and passed on etc should also be named consistently, for example "sample.weights" instead of "weights", "Kaplan-Meier" instead of "KM". This includes everything in the R code too, y_fit is not consistent with S.hat, nor varName, use consistent . for variable.names. b) go over every file to make sure coding style is consistent, use of white space is consistent, use of assignemnt operator <- is consistent. this file https://github.com/som-shahlab/survmetabench/blob/master/r-package/survlearners/R/utils.R is a mess which mixes double double tabs, =/<- , sometimes have space between =, sometimes not, sometimes have space between if(.., sometimes not, some files use single ' some files use ", needs to be fixed and make consistent througout. Look at GRF for example

6) Final package requirement: I will turn on pedantic mode in R CMD check here: https://github.com/som-shahlab/survmetabench/pull/10 which means you will have to make R CMD checks gives zero warnings, which means for example there can be no mistakes in @ docstrings. Ideally it will make NOTEs in R CMD check illegal too,

7) Final before Nikos/me goes over all the code logic in estimators: a) make sure no more major cleanup needed b) meet and make sure everyone agrees what the expectation of surv_* is. Should the non-survival Rlearners be made internal

CrystalXuR commented 2 years ago

Thank you Erik for the well-explained list, I will go through them and let you know when I am done!

erikcs commented 2 years ago

Here are some suggestions for ~ invariance tests to put in for all estimators:

  1. Flipping treatment indicator W should give tau(x) ~= -tau(x)_flipped
  2. Shifting all outcomes Y by a constant Y_new = Y + constant should give the same tau(x) with t0_new = t0 + constant

edit: added in #135