nm_join with .superset = TRUE is not sorted properly

barrettk commented 2 years ago

Previously, nm_join merged datasets via the join function specified below:

  if (.superset) {
    join_fun <- right_join
  } else {
    join_fun <- left_join
  }

When .superset = TRUE however, this would cause the order of the .join_col to change during that call (.d <- join_fun(tab, .d, by = .join_col)) if the same values were not present in both datasets.

i.e.

> dplyr::n_distinct(original_data$NUM)
[1] 799
> dplyr::n_distinct(simulated_data$NUM)
[1] 779

The proposed changes will shift .superset functionality to solely depend on left_join, which will ensure that the order does not change when merging:

  if (.superset) {
    join_fun <- function(x, y, ...) left_join(y, x, ...)
  } else {
    join_fun <- left_join
  }

closes https://github.com/metrumresearchgroup/bbr/issues/490

barrettk commented 2 years ago

Its passing in drone, but not locally for me. Specifically model_summary() is having issues in test-print:

Here is the specific error:

> tmp_mod <- file.path(MODEL_DIR, 1) %>%
+     read_model()

> tmp_mod %>% model_summary()
Dataset: ../../../../extdata/acop.csv                                                                                                                                                                  

Records: 779     Observations: 741   Subjects: 39

Objective Function Value (final est. method): 2583.311

Estimation Method(s):

– First Order Conditional Estimation with Interaction 

**Heuristic Problem(s) Detected:**

– hessian_reset 

 Error in split_l[[.i]] : subscript out of bounds

Test Failures

> devtools::test(filter = "print")
ℹ Loading bbr
ℹ Testing bbr
✔ | F W S  OK | Context
✖ | 7      23 | testing print methods for bbi objects [3.6s]                                                                                                        
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Error (test-print.R:104:5): print.bbi_nonmem_summary works basic FOCE model [BBR-PRNT-003]
Error in `split_l[[.i]]`: subscript out of bounds
Backtrace:
 1. utils::capture.output(print(.s))
      at test-print.R:104:4
 4. bbr::print.bbi_nonmem_summary(.s)
 5. purrr::map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
      at bbr/R/print.R:241:4
 6. bbr .f(.x[[i]], ...)
      at purrr/R/map.R:197:2

Error (test-print.R:113:5): print.bbi_nonmem_summary works mixture model [BBR-PRNT-003]
Error in `split_l[[.i]]`: subscript out of bounds
Backtrace:
  1. testthat::expect_warning(...)
       at test-print.R:113:4
  9. bbr::print.bbi_nonmem_summary(.s)
 10. purrr::map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
       at bbr/R/print.R:241:4
 11. bbr .f(.x[[i]], ...)
       at purrr/R/map.R:197:2

Error (test-print.R:133:5): print.bbi_nonmem_summary works SAEM-IMP model [BBR-PRNT-003]
Error in `split_l[[.i]]`: subscript out of bounds
Backtrace:
 1. utils::capture.output(print(.s))
      at test-print.R:133:4
 4. bbr::print.bbi_nonmem_summary(.s)
 5. purrr::map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
      at bbr/R/print.R:241:4
 6. bbr .f(.x[[i]], ...)
      at purrr/R/map.R:197:2

Error (test-print.R:143:5): print.bbi_nonmem_summary works IOV model [BBR-PRNT-003]
Error in `split_l[[.i]]`: subscript out of bounds
Backtrace:
 1. utils::capture.output(print(.s))
      at test-print.R:143:4
 4. bbr::print.bbi_nonmem_summary(.s)
 5. purrr::map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
      at bbr/R/print.R:241:4
 6. bbr .f(.x[[i]], ...)
      at purrr/R/map.R:197:2

Error (test-print.R:153:5): print.bbi_nonmem_summary .fixed=TRUE [BBR-PRNT-003]
Error in `split_l[[.i]]`: subscript out of bounds
Backtrace:
 1. utils::capture.output(print(.s, .fixed = TRUE))
      at test-print.R:153:4
 4. bbr::print.bbi_nonmem_summary(.s, .fixed = TRUE)
 5. purrr::map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
      at bbr/R/print.R:241:4
 6. bbr .f(.x[[i]], ...)
      at purrr/R/map.R:197:2

Error (test-print.R:171:5): print.bbi_nonmem_summary .nrow argument [BBR-PRNT-003]
Error in `split_l[[.i]]`: subscript out of bounds
Backtrace:
 1. utils::capture.output(print(.s, .nrow = 15, .fixed = TRUE))
      at test-print.R:171:4
 4. bbr::print.bbi_nonmem_summary(.s, .nrow = 15, .fixed = TRUE)
 5. purrr::map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
      at bbr/R/print.R:241:4
 6. bbr .f(.x[[i]], ...)
      at purrr/R/map.R:197:2

Error (test-print.R:183:5): print.bbi_nonmem_summary .off_diag=TRUE [BBR-PRNT-003]
Error in `split_l[[.i]]`: subscript out of bounds
Backtrace:
 1. utils::capture.output(print(.s, .off_diag = TRUE))
      at test-print.R:183:4
 4. bbr::print.bbi_nonmem_summary(.s, .off_diag = TRUE)
 5. purrr::map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
      at bbr/R/print.R:241:4
 6. bbr .f(.x[[i]], ...)
      at purrr/R/map.R:197:2
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 3.8 s

[ FAIL 7 | WARN 0 | SKIP 0 | PASS 23 ]

barrettk commented 2 years ago

I've narrowed down the issue to the highlighted section. odd.

> class(tmp) <- c(as.character(glue("bbi_{.model_type}_summary")), BBI_PARENT_CLASS, class(tmp))
> tmp
Dataset: ../../../../extdata/acop.csv

Records: 779     Observations: 741   Subjects: 39

Objective Function Value (final est. method): 2583.311

Estimation Method(s):

– First Order Conditional Estimation with Interaction 

**Heuristic Problem(s) Detected:**

– hessian_reset 

 Error in split_l[[.i]] : subscript out of bounds

I think the issue is likely in print.bbi_nonmem_summary, though im still perplexed as to why it would pass in drone (although I didnt touch those functions in this PR).

EDIT:

The issue happens on line 241 of print.R:

    # add color for shrinkage
> param_str <- map_chr(param_str, highlight_cell, .i = 5, .threshold = 30)
Error in split_l[[.i]] : subscript out of bounds

I think its because there are only 4 columns?

Though IMO that wouldn't make sense given that we intentionally only select 4 columns on line 231:

  param_df <- param_df %>%
    select(.data$parameter_names, .data$estimate, .data$stderr, .data$shrinkage) %>%
    mutate_if(is.numeric, sig, .digits = .digits)

Dont have much experience with creating new classes. Ultimately I feel like the issue lies in some package/function I have loaded thats superseding the imports, but have tried restarting things in multiple ways. Any thoughts @kyleam ?

barrettk commented 2 years ago

Im not sure what's going on, but have come to the conclusion that nothing is actually wrong with any of the functions (figured, but wanted to do my due diligence).

Running the tests in the script via the button on the top works.

Its only while running devtools::test() or devtools::test(filter = "print") that I get issues.

I think there's something funky going on while running the tests, but that's likely out of scope for this PR

kyleam commented 2 years ago

Its only while running devtools::test() or devtools::test(filter = "print") that I get issues.

I haven't looked closely at the code you're pointing to, but just a note that all tests pass on my end (with this PR's 7a10594 checked out).

I think there's something funky going on while running the tests, but that's likely out of scope for this PR

Agreed. Let's keep an eye out for it, but it seems very unlikely to be related to this PR.

barrettk commented 2 years ago

@kyleam sounds good. Let me know when the PR is ready to be approved

kylebaron commented 2 years ago

@barrettk This is the cause of the problem:

if (.superset) {
    join_fun <- right_join
  } else {
    join_fun <- left_join
  }

We already covered this ground with redataset().

You always need to left join; when .superset is TRUE, put the data on the left; otherwise put the table output on the left.

I think it should look like this:

https://ghe.metrumrg.com/software-incubator/field-of-dreams/blob/master/redataset/redataset.R#L54-L58

kylebaron commented 2 years ago

@kyleam - agree on https://github.com/metrumresearchgroup/bbr/pull/508#pullrequestreview-1023946733; I think forcing in an arrange step could only confuse things and IMO would rather just take the data as-is.

barrettk commented 2 years ago

@kyleam I saw that, but didnt think we needed a test (at that time) and figured Kyle had alluded to enough of the issue (ill write up a summary for the description of the PR though). If we got rid of the arrange call I can see a reason to add a test (and I see why it makes sense to remove it now), but id like to hop on a short call to make sure we're on the same page. Will ping you later today or tomorrow

metrumresearchgroup / bbr

nm_join with .superset = TRUE is not sorted properly #508

Test Failures