metrumresearchgroup / bbr

R interface for model and project management
https://metrumresearchgroup.github.io/bbr/
Other
23 stars 2 forks source link

Check that .join_col isn't getting truncated by NONMEM #531

Closed kylebaron closed 2 years ago

kylebaron commented 2 years ago

Summary

By default, NONMEM formats $TABLE outputs to 5 digits like this 2.7602E+00. If there are more than 99,999 rows in a data set, the row number data (NUM) will get truncated. The only work around for this is to ask $TABLE for a wider format.

In the case that we have 5 output digits and those long row numbers get truncated, the join won't work. Can we assess this and put in some logic to head this off?

kylebaron commented 2 years ago

This is what can happen when you take the default NONMEM $TABLE formatting, but have row numbers in the 100,000s

> data <- nm_join(here("model/pk/789"))
Reading data file: analysis99.csv                              0s
  rows: 4360
  cols: 34

Reading 789.tab
  rows: 4292
  cols: 16

Reading 789par.tab
  rows: 4292
  cols: 9

tab adds 15 new cols                                                                               

par.tab adds 0 new cols
  dropping 8 duplicate cols: CL, V2, Q, V3, KA, ETA1, ETA2, ETA3

final join stats:
  rows: 42738
  cols: 49

This is a well-known gotcha ... for example when you have long subject IDs.

I think we can say that the number of rows in the output should equal the number of rows in non-FIRSTONLY tables for non-superset or the number of rows in the data for superset. This might be a reasonable sanity check.

kylebaron commented 2 years ago

Another solution would be to check for duplicate row numbers in the table files.

Also, we could allow multiple join columns (originally in the spec for redataset) but I'd be less inclined to do this in favor of just getting wider format from NONMEM.

seth127 commented 2 years ago

Interesting. So, on the nm_join() side, is the idea that we would error/warn if we see something like duplicate row numbers? It seems like (if we put in a check like that) at the point that it catches something, we can't really repair it or anything, right? We just have to tell the user to go back and fix it on the NONMEM side.

kylebaron commented 2 years ago

Right ... all we can do is raise a flag saying this didn't work.

seth127 commented 2 years ago

Ok, and I think we actually want to error, right? Because there shouldn't be any (correct) situation where the join column has duplicate rows.

So that would be the requirement we add: "nm_join() informatively errors if .join_col has duplicate values in any of the input tables."

If that sounds right, should be pretty easy to implement and get out quickly.

andersone1 commented 2 months ago

FORMAT=s1PE12.5

$TABLE NUM ID TIME IPRED NPDE CWRES NOPRINT ONEHEADER RANMETHOD=P FORMAT=s1PE12.5 FILE=diagnostic.tab