openpharma / crmPack

Object-Oriented Implementation of CRM Designs
https://openpharma.github.io/crmPack/
20 stars 10 forks source link

Inconsistent slot names #407

Open PuzzledFace opened 2 years ago

PuzzledFace commented 2 years ago

Some classes use slot names that are inconsistent within the same hierarchy. For example, slots size, sizes and cohortSize are all used to denote the size of a cohort in subclasses of CohortSize. This is annoying.

The following function allows a quick-and-dirty comparision of slot names to be made.

compare_slot_names <- function(cls) {
  sub_classes <- getClass(cls)@subclasses
  lapply(
    names(sub_classes),
    function(x) {
      tibble(Class=x, Slot=slotNames(x), Present=TRUE)
    }
  ) %>% 
  bind_rows() %>% 
  pivot_wider(
    names_from=Class,
    values_from=Present,
    values_fill=FALSE
  ) %>% 
  arrange(Slot)
}

For example ...

> compare_slot_names("CohortSize")
# A tibble: 6 × 7
  Slot           CohortSizeRange CohortSizeDLT CohortSizeConst CohortSizeParts CohortSizeMax CohortSizeMin
  <chr>          <lgl>           <lgl>         <lgl>           <lgl>           <lgl>         <lgl>        
1 cohortSize     TRUE            TRUE          FALSE           FALSE           FALSE         FALSE        
2 cohortSizeList FALSE           FALSE         FALSE           FALSE           TRUE          TRUE         
3 DLTintervals   FALSE           TRUE          FALSE           FALSE           FALSE         FALSE        
4 intervals      TRUE            FALSE         FALSE           FALSE           FALSE         FALSE        
5 size           FALSE           FALSE         TRUE            FALSE           FALSE         FALSE        
6 sizes          FALSE           FALSE         FALSE           TRUE            FALSE         FALSE

... illustrating the point just made as well as showing the use of both intervals and DLTintervals.

The outputs from compare_slot_names("Data") and compare_slot_names("Design") appear uninteresting.

compare_slot_names("GeneralModel") %>% select(1:5) %>% print(n=nrow(.)) gives

# A tibble: 37 × 5
   Slot        Model LogisticLogNormal ProbitLogNormal LogisticLogNormalSub
   <chr>       <lgl> <lgl>             <lgl>           <lgl>               
 1 comp1       FALSE FALSE             FALSE           FALSE               
 2 comp2       FALSE FALSE             FALSE           FALSE               
 3 components  FALSE FALSE             FALSE           FALSE               
 4 cov         FALSE TRUE              FALSE           TRUE                
 5 datamodel   TRUE  TRUE              TRUE            TRUE                
 6 datanames   TRUE  TRUE              TRUE            TRUE                
 7 delta1      FALSE FALSE             FALSE           FALSE               
 8 dose        TRUE  TRUE              TRUE            TRUE                
 9 E0          FALSE FALSE             FALSE           FALSE               
10 ED50        FALSE FALSE             FALSE           FALSE               
11 Emax        FALSE FALSE             FALSE           FALSE               
12 init        TRUE  TRUE              TRUE            TRUE                
13 logNormal   FALSE FALSE             FALSE           FALSE               
14 mean        FALSE TRUE              FALSE           TRUE                
15 mode        FALSE FALSE             FALSE           FALSE               
16 modelspecs  TRUE  TRUE              TRUE            TRUE                
17 mu          FALSE FALSE             TRUE            FALSE               
18 prec        FALSE FALSE             FALSE           FALSE               
19 priormodel  TRUE  TRUE              TRUE            TRUE                
20 prob        TRUE  TRUE              TRUE            TRUE                
21 refDose     FALSE TRUE              TRUE            TRUE                
22 refDoseBeta FALSE FALSE             FALSE           FALSE               
23 refDoseEmax FALSE FALSE             FALSE           FALSE               
24 rho         FALSE FALSE             FALSE           FALSE               
25 sample      TRUE  TRUE              TRUE            TRUE                
26 shareWeight FALSE FALSE             FALSE           FALSE               
27 Sigma       FALSE FALSE             TRUE            FALSE               
28 sigma2betaW FALSE FALSE             FALSE           FALSE               
29 sigma2W     FALSE FALSE             FALSE           FALSE               
30 theta       FALSE FALSE             FALSE           FALSE               
31 useFixed    FALSE FALSE             FALSE           FALSE               
32 useLogDose  FALSE FALSE             TRUE            FALSE               
33 useRW1      FALSE FALSE             FALSE           FALSE               
34 weightpar   FALSE FALSE             FALSE           FALSE               
35 weights     FALSE FALSE             FALSE           FALSE               
36 xmax        FALSE FALSE             FALSE           FALSE               
37 xmin        FALSE FALSE             FALSE           FALSE 

Suggesting that comp1 and comp2 (separate slots) are inconsistent with components (a list) and the use of weightpar, weights and shareWeights could be rationalised. As perhaps could mu and mean (and possibly theta).

Also Sigma versus sigma2beta and sigma2W is annoying, both because of the inconsistent case and the absence of the 2.

compare_slot_names("Increments") gives

# A tibble: 7 × 6
  Slot           IncrementsRelative IncrementsNumDoseLevels IncrementsRelativeDLT IncrementMin IncrementsRelativeParts
  <chr>          <lgl>              <lgl>                   <lgl>                 <lgl>        <lgl>                  
1 cleanStart     FALSE              FALSE                   FALSE                 FALSE        TRUE                   
2 DLTintervals   FALSE              FALSE                   TRUE                  FALSE        FALSE                  
3 dltStart       FALSE              FALSE                   FALSE                 FALSE        TRUE                   
4 increments     TRUE               FALSE                   TRUE                  FALSE        TRUE                   
5 IncrementsList FALSE              FALSE                   FALSE                 TRUE         FALSE                  
6 intervals      TRUE               FALSE                   FALSE                 FALSE        TRUE                   
7 maxLevels      FALSE              TRUE                    FALSE                 FALSE        FALSE 

DLTintervals vs intervals is a candidate for improvement (if the distinction is important, why not DLTincrements as well?) and is the upper case I in IncrementsList appropriate (even though it's a list of Increments objects)?

compare_slot_names("NextBest") and compare_slot_names("Stopping") appear uninteresting.

To do:

PuzzledFace commented 1 year ago

I suggest renaming slots so that the most general term is used as the name for all related slots. So, for example, the DLTintervals (or dlt_intervals) slot in CohortSizeDLT should be renamed intervals. Where a class contains a "list of subclasses", then the slot should be a simple plural of the corresponding simple slot name - so cohortSizeList would become sizes.

Subclasses of CohortSize

Class              Current slot name                  Proposed new slot name
CohortSizeRange    cohortSize (cohort_size)           size
CohortSizeDLT      cohortSize (cohort_size)           size
CohortSizeMax      cohortSizeList (cohort_size_list)  sizes
CohortSizeMin      cohortSizeList (cohort_size_list)  sizes
CohortSizeDLT      DLTintervals (dlt_intervals)       intervals

No other changes to slot names.

Subclasses of GeneralModel

Class                       Current slot name            Proposed new slot name
LogisticNormalMixture       comp1                        components
                            comp2                        components
                            weightpar                    weights
LogisticLogNormalMixture    shareWeight                  weights
ProbitLogNormal             mu                           mean
                            Sigma                        sigma2

No other changes to slot names. [What about the theta slot in LogisticKadaneBetaGamma?

Subclasses of Increments

Class                   Current slot name    Proposed new slot name
IncrementsRelativeDLT   DLTIntervals         intervals
IncrementMin            IncrementsList       increments [*]

No other changes to slot names.

[*]: This does introduce one area of potential confusion: the increments slot in (eg) IncrementsRelative is a vector of reals, whereas in IncrementMin it will be a list of objects.

Implementation

These are potentially breaking changes. Therefore, users should be given the opportunity to change their code. I propose

For example

.IncrementsRelativeDLT <- setClass(
  Class = "IncrementsRelativeDLT",
  slots = representation(
    intervals = "integer",
    dlt_intervals = "integer",
    increments = "numeric"
  ),
  prototype = prototype(
    intervals = c(0L, 1L),
    dlt_intervals = c(0L, 1L),
    increments = c(2, 1)
  ),
  contains = "Increments",
  validity = v_increments_relative_dlt
)

setGeneric("dlt_intervals", function(x) standardGeneric("dlt_intervals"))
setGeneric("dlt_intervals<-", function(x, value) standardGeneric("dlt_intervals<-"))

setMethod("dlt_intervals", "IncrementsRelativeDLT", function(x) {
  .Deprecated(
    "intervals", 
    "crmPack", 
    "Please use slot `intervals` instead.  This slot may be removed in a future version of crmPack"
  )
  x@intervals
})
setMethod("dlt_intervals<-", "IncrementsRelativeDLT", function(x, value) {
  .Deprecated(
    "intervals", 
    "crmPack",
    "Please use slot `intervals` instead.  This slot may be removed in a future version of crmPack"
  )
  x@intervals <- value
  x
})

To minimise the possibility of conflicts with other development work, I suggest creating issues for each superclass, and making the changes superclass by superclass.

danielinteractive commented 1 year ago

Thanks @PuzzledFace , in the light of prioritizing next CRAN release, and since this anyway breaks almost everything I think... I wonder if we can make this simpler? Without all the deprecation warnings etc.? I guess this would simplify this task a lot

PuzzledFace commented 1 year ago

Thanks @PuzzledFace , in the light of prioritizing next CRAN release, and since this anyway breaks almost everything I think... I wonder if we can make this simpler? Without all the deprecation warnings etc.? I guess this would simplify this task a lot

It could be made simpler by just renaming the existing slots. This would greatly reduce the amount of work required. However, this would be a breaking change and I don't think it's a good idea. The who point of the slot "duplication" and deprecation warnings is to make sure that it is not a breaking change: users of the existing slots can use their existing code, but get a warning saying they should update at their convenience.

I firmly believe we should stick with the approach I outlined above.

danielinteractive commented 1 year ago

Thanks @PuzzledFace, generally I agree - however in this case the whole package breaks anyway due to the many changes we did. Therefore I would say in this special case we don't need the deprecation?