[bug fix] avoid duplicate creation of rrr/sss variables for models with multiple components of the same period

RWParsons commented 1 week ago

Motivated by @MilaSMayor https://github.com/ropensci/GLMMcosinor/issues/27

When there is a model with multiple components with the same period (i.e in the main formula), there are multiple rrr/sss variables created and when the model is fit, it is rank deficient.

reprex:


test_data <- structure(list(row = c(
  4790, 18351, 8537, 20294, 9658, 4846,
  20603, 5223, 2067, 9061, 20751, 13979, 5870, 12182, 967, 2546,
  3189, 3220, 13284, 1002, 13459, 12946, 2946, 3601, 13539, 10268,
  13002, 17665, 7646, 5958, 10857, 20276, 16712, 1694, 216, 12010,
  10686, 12147, 15580, 20189, 9939, 9503, 101, 10948, 5775, 7025,
  8109, 15669, 9830, 4564, 425, 20181, 16576, 15469, 2783, 10341,
  16893, 20882, 18252, 20306, 203, 11359, 1225, 16813, 9084, 17500,
  3147, 298, 5264, 5396, 565, 5883, 15172, 533, 11996, 12597, 11396,
  10913, 4000, 5098, 455, 1614, 13075, 14176, 10470, 10993, 16228,
  430, 15060, 17038, 9419, 16489, 5773, 8590, 640, 19471, 18603,
  12875, 15504, 57
), new_ID = c(
  12, 36, 21, 41, 22, 12, 41, 13,
  5, 22, 41, 30, 16, 26, 3, 6, 7, 7, 28, 3, 28, 28, 7, 9, 28, 23,
  28, 35, 19, 16, 24, 41, 34, 5, 2, 26, 24, 26, 32, 40, 23, 22,
  1, 25, 16, 18, 20, 32, 23, 12, 2, 40, 33, 32, 7, 24, 34, 41,
  36, 41, 2, 25, 3, 34, 22, 35, 7, 2, 13, 13, 2, 16, 31, 2, 26,
  27, 25, 24, 10, 12, 2, 4, 28, 30, 24, 25, 33, 2, 31, 34, 22,
  33, 16, 21, 2, 38, 36, 27, 32, 1
), P1 = c(
  1, 0, 1, 0, 0, 0, 1,
  0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,
  0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
  0, 0, 1, 0, 0, 1, 1, 0, 0
), hour = c(
  9, 5, 4, 4, 22, 17, 1, 19,
  4, 20, 5, 15, 16, 16, 17, 18, 0, 10, 9, 4, 16, 6, 21, 3, 0, 14,
  14, 1, 10, 9, 2, 10, 6, 13, 10, 12, 15, 5, 17, 19, 21, 11, 20,
  23, 17, 22, 10, 19, 8, 23, 3, 11, 14, 1, 6, 17, 22, 23, 13, 16,
  21, 18, 11, 11, 19, 19, 5, 20, 13, 4, 23, 5, 11, 15, 22, 13,
  8, 11, 4, 14, 9, 16, 16, 20, 3, 9, 0, 8, 19, 23, 19, 23, 15,
  22, 2, 0, 17, 0, 13, 3
), P2 = c(
  0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
  0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
  1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
  1, 0, 0, 0, 0, 0
), original_data = c(
  81.1505152399503, 25.290871802584,
  105.638906673811, 37.6492569611346, 75.2487861724966, 65.6833160668856,
  9.26783774456766, 37.5826334335925, 132.174068026666, 57.505395243653,
  13.5776684159436, 34.1192965220782, 40.4604603007995, 55.9040534073833,
  49.6843789034094, 47.5811345810795, 105.434826734125, 149.723681235355,
  128.755907667873, 84.8629469676206, 101.093905272917, 79.7384167936468,
  109.690049660256, 47.0894715665344, 134.577092010154, 119.630895698006,
  86.3222531011461, 111.629047174657, 57.0934238631868, 69.6968295542609,
  21.2675152551682, 21.4871946212554, 202.0797786341, 159.959221228979,
  104.101782114698, 85.2241982528729, 55.7058656015178, 105.785198530154,
  135.107579179321, 36.5356093367404, 106.377426214008, 89.5726564511067,
  84.7104602516007, 39.5980040753834, 24.9267309607656, 135.335264866478,
  123.325858027693, 133.09574834453, 45.911723466512, 50.527642877713,
  132.443340274967, 86.2973786263588, 102.03972044404, 82.341153412036,
  119.671509809086, 92.6547343881397, 111.477879383805, 18.8626183172769,
  25.4766095900354, 24.9683664722985, 125.682732431718, 40.7463158296111,
  65.4705467057566, 106.495240867716, 114.097125665045, 66.3119368814677,
  103.421642414608, 133.292151951722, 75.4031906956303, 42.7837715366637,
  169.545418885058, 36.3449779611802, 89.3729830186157, 111.991893048835,
  87.5508149602019, 78.5903719775091, 58.4495859588738, 41.1225123752,
  22.5018080277647, 64.7797948532742, 154.513873104168, 51.1921505575342,
  129.457825425795, 31.1606685505752, 181.918933013, 33.0354243947179,
  47.6738355795763, 110.168010508212, 70.7558731891458, 278.711410318731,
  93.2632025310014, 44.1055460201902, 81.2456787545665, 144.904199653632,
  77.0182663030833, 63.2390713860117, 31.8291538681657, 116.205553374686,
  175.390477503616, 60.5171426584574
)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"))

cglmm(
  original_data ~ P2 * P1 +
    amp_acro(time_col = "hour", n_components = 1, group = c("P2"), period = c(24)) +
    amp_acro(time_col = "hour", n_components = 1, group = "P1", period = 24) +
    (1 | new_ID),
  REML = TRUE,
  data = send |> sample_n(100)
)

cglmm(
  original_data ~ P2 * P1 +
    amp_acro(time_col = "hour", n_components = 2, group = c("P2", "P1"), period = c(24, 24)) +
    (1 | new_ID),
  REML = TRUE,
  data = send |> sample_n(100)
)

Both of these should work but neither do. They work if you change the period of one of the components to be 23 but this shouldn't be necessary.

They can also be fixed by changing the formula (start of data_processor()) to use main_rrr1 and mainsss1 for both interaction terms rather instead of having one of the groups interact with a main[rrr/sss]2.

MilaSMayor commented 1 week ago

Many thanks Rex. But I continue missing something. If I want to run this model: Y = P1*P2 + (1|new_id) Applying cosinor mixed model, I would expect all the terms in the formula as you can see below. Y ~ P1 + P2 + P1*P2 + P1*rrr + P1*sss + P2*rrr + P2*sss + P1*P2*rrr + P1*P2*sss + (1|new_id)

As well, both P1 and P2 are factors. Many thanks again.

Best,

Mila

RWParsons commented 6 days ago

Ahh yeah I see. I've made a model here which includes all groupings by creating a new variable as the combination of P1 and P2.

d <- read.delim("TestSend.txt") |> as_tibble()

cnames <- names(d) |>
  str_split("\\.") |>
  unlist()

send <- d |>
  separate_wider_delim(cols = 1, delim = " ", names = c("row", cnames), too_many = "drop") |>
  mutate(
    across(everything(), as.numeric),
    p1_2 = paste0(P1, P2)
  )

m <- cglmm(
  original_data ~ P2 * P1 +
    amp_acro(time_col = "hour", n_components = 1, group = c("p1_2"), period = c(24)) +
    (1 | new_ID),
  REML = TRUE,
  data = send
)

milas-model2.zip

This should give you all the combinations that you want but the parameters of the model are a bit more annoying to interpret than if they were done as an interaction. You should be able to do this manually using the results for the different groupings in summary (summary(m)).

ropensci / GLMMcosinor

[bug fix] avoid duplicate creation of rrr/sss variables for models with multiple components of the same period #29