metrumresearchgroup / bbr

R interface for model and project management
https://metrumresearchgroup.github.io/bbr/
Other
23 stars 2 forks source link

update_run_num and nm_join problematic case studies #558

Closed janelleh closed 1 year ago

janelleh commented 1 year ago

Hi all, I just wanted to give some feedback on some snags I ran into when using these functions on a project and suggest some possible solutions you could implement.

1) update_run_number : When I used this on a Bayes run, the .chn and .msf got passed into a toupper somewhere in the function, and my control stream was replaced with .CHN and .MSF. this is problematic because I had other functions set up to read in .chn, and the uppercase threw that off. I had to manually go into every control stream and change .CHN and .MSF to .chn and .msf. Note: The toupper does NOT effect the .tab, par.tab, etc., so there is something inconsistent about just the .chn and .msf being changed, but not the others.

Here are some pieces from a sort of working example:

> bbi_version() 
[1] "3.2.1"

modTEST <- new_model(file.path(MODEL_DIR, 999999)) 

# here is what is in model 999999:

$EST METHOD=CHAIN FILE=999999.chn NSAMPLE=4 ISAMPLE=0 SEED=10 CTYPE=0 IACCEPT=0.2 DF=10 DFS=0
;$EST METHOD=NUTS SEED=1 NBURN=1000 NITER=2000 PRINT=1 MSFO=./999999.msf RANMETHOD=P PARAFPRINT=10000 BAYES_PHI_STORE=1

;$TABLE ID NUM AUCss BASE KIN KOUT IMAX AUC50 PLAC ETAS(1:LAST) PRED IPRED EPRED EWRES NPDE NOPRINT ONEHEADER FILE=./999999.tab RANMETHOD=P 

copy_model_from(modTEST, "000000", .overwrite = T) %>% update_run_number()

# And here is what it gets changed to via update_run_number:

$EST METHOD=CHAIN FILE=000000.CHN NSAMPLE=4 ISAMPLE=0 SEED=10 CTYPE=0 IACCEPT=0.2 DF=10 DFS=0
;$EST METHOD=NUTS SEED=1 NBURN=1000 NITER=2000 PRINT=1 MSFO=./000000.MSF RANMETHOD=P PARAFPRINT=10000 BAYES_PHI_STORE=1

;$TABLE ID NUM AUCss BASE KIN KOUT IMAX AUC50 PLAC ETAS(1:LAST) PRED IPRED EPRED EWRES NPDE NOPRINT ONEHEADER FILE=./000000.tab RANMETHOD=P 

I think the most important thing here is to be consistent -- the chn, msf, tab and par.tab etc. should be treated the same IMO, otherwise makes for problems post-processing when you have to guess which file is uppercase and which is lowercase.

2) nm_join : I also had some issues here that I assume were caused by toupper hiding somehwere in the function. I ran nm_join, grabbed that output, and fed it into my mrgsolve model. The mrgsolve model did not work correctly because one of the parameters got changed to uppercase. I had "AUCss" tabled out from the Nonmem run, then I had "AUCss" as an input parameter to my mrgsolve model. However because nm_join turned it into "AUCSS", the parameter wasn't fed into my mrgsolve model causing problems. Maybe I should have used all caps to follow best coding practices and probably will from here on out, but why is the toupper implemented in nm_join? Any good reason to keep it there? Was it so that the input data set and the Nonmem tabled out data would all be in the same case and therefore be able to join? If so that totally makes sense to me, but it was a little dangerous in this case here. Food for thought!

Thanks :)

callistosp commented 1 year ago

For sub-issue 1, you should use the bbr function update_model_id instead of update_run_number, that was a helper function not included with previous versions of the package

janelleh commented 1 year ago

Thanks @callistosp!! You can ignore the first one!

For the second, I think it does kind of make sense to make everything uppsercase as a safety mechanism to ensure that the $DATA and the .tab/par.tab are comparable and bind-able. But I can't figure out why its happening to AUCss b/c its not defined as a .join_col. https://github.com/metrumresearchgroup/bbr/blob/7943fe38e113d3308ea684e76f7fac1bcfbc47e0/R/nm-join.R#L113

> datatmp <- data.table::fread(here("data/derived/bii0807f-pd.csv"))
> names(datatmp)
 [1] "C"        "ID"       "TIME"     "CMT"      "MDV"      "DV"       "AUCss"    "DOSE"     "BLQ"      "HBAB"     "VISNUM"  
[12] "REG"      "BLAGE"    "WT"       "BLWT"     "HGT"      "BLHGT"    "BMI"      "CNTRY"    "BLBMI"    "CONMED"   "BSA"     
[23] "SEX"      "RACE"     "ETHN"     "SMOK"     "BLAST"    "BLALT"    "BLBILI"   "SCR"      "CRUMOL"   "BLSCR"    "BLCRUMOL"
[34] "DEMO"     "PROJ"     "DAY"      "SS"       "POP"      "EGFR"     "BLEGFR"   "CT58"     "CT59"     "CT60"     "INS"     
[45] "MET"      "BLMET"    "UNSCHDPK" "NOPK"     "NODOSE"   "SEQ"      "HITAD"    "FRSTHR"   "METONLY"  "SID1"     "NUM"     
[56] "COUNTRY"  "DATETIME" "DATE"     "TIMO"     "TIDE"     "DAT"      "PTNO"     "USUBJID"  "VISIT"    "EXTRT"    "REGC"    
> chk_dat <- nm_join(here("model/nonmem/pd/1036/1036_1"))
Reading data file: bii0807f-pd.csv                                                                                                                                     
  rows: 410
  cols: 66

Reading 1036.tab
  rows: 376
  cols: 20

X1036.tab adds 17 new cols                                                                                                                                             
  dropping 2 duplicate cols: ID, AUCSS

final join stats:
  rows: 376
  cols: 83
> names(chk_dat)
 [1] "C"        "ID"       "TIME"     "CMT"      "MDV"      "DV.DATA"  "AUCSS"    "DOSE"     "BLQ"      "HBAB"     "VISNUM"  
[12] "REG"      "BLAGE"    "WT"       "BLWT"     "HGT"      "BLHGT"    "BMI"      "CNTRY"    "BLBMI"    "CONMED"   "BSA"     
[23] "SEX"      "RACE"     "ETHN"     "SMOK"     "BLAST"    "BLALT"    "BLBILI"   "SCR"      "CRUMOL"   "BLSCR"    "BLCRUMOL"
[34] "DEMO"     "PROJ"     "DAY"      "SS"       "POP"      "EGFR"     "BLEGFR"   "CT58"     "CT59"     "CT60"     "INS"     
[45] "MET"      "BLMET"    "UNSCHDPK" "NOPK"     "NODOSE"   "SEQ"      "HITAD"    "FRSTHR"   "METONLY"  "SID1"     "NUM"     
[56] "COUNTRY"  "DATETIME" "DATE"     "TIMO"     "TIDE"     "DAT"      "PTNO"     "USUBJID"  "VISIT"    "EXTRT"    "REGC"    
[67] "BASE"     "KIN"      "KOUT"     "IMAX"     "AUC50"    "PLAC"     "ETA1"     "ETA2"     "ETA3"     "IPRED"    "EPRED"   
[78] "EWRES"    "NPDE"     "DV"       "PRED"     "RES"      "WRES"  

AUCss when reading in the csv, AUCSS when using nm_join

And this is the table statement:

$TABLE ID NUM AUCss BASE KIN KOUT IMAX AUC50 PLAC ETAS(1:LAST) PRED IPRED EPRED EWRES NPDE NOPRINT ONEHEADER FILE=./1036.tab RANMETHOD=P

When I read the table in as-is, it is still lowercase:

> tabtmp <- data.table::fread(here("model/nonmem/pd/1036/1036_1/1036.tab"))
> names(tabtmp)
 [1] "ID"    "NUM"   "AUCss" "BASE"  "KIN"   "KOUT"  "IMAX"  "AUC50" "PLAC"  "ETA1"  "ETA2"  "ETA3"  "IPRED" "EPRED" "EWRES" "NPDE" 
[17] "DV"    "PRED"  "RES"   "WRES"

So it must be happening via nm_join.

seth127 commented 1 year ago

Thanks for reporting @janelleh

So, first off, is update_model_id() working the way you expect it to? I think it should be, from looking at the implementation code here. Also, we have a test that looks like it's testing this exact case.

For the nm_join() issue, yes it does look like we're upper-casing all the columns as they come in. nm_join() uses nm_tables() under-the-hood, which in turn uses nm_file() and the implementation for that appears to upper-case everything here.

In terms of why we do that... I'm not sure. I can't remember the justification for it, and I could certainly entertain changing that behavior in a future release. @kylebaron do you have any thoughts on this?

kylebaron commented 1 year ago

I didn't realize these functions were changing the names of the data sets. Definitely agree the case should be left alone (everywhere) unless there was some good motivation for changing case.

janelleh commented 1 year ago

@seth127 turns out update_model_id() is not a bbr function, it was something I was sourcing from an old example project functions script. The function in bbr is update_run_id and does work as expected. You can ignore Issue # 1!

Issue # 2 - thanks for digging into nm_join. As I said I def understand if the idea was to make everything all caps so that the joining of the $DATA and .tab files will always have column names in the same format, but as a user I would agree with @kylebaron and want it to be left as is. If the user makes a mistake with mismatched cases that is on them, but the function changing it up is surprising and could be problematic like in my use case here.

seth127 commented 1 year ago

Ok, that makes sense. Like I said, I don't remember any specific reason for upper-casing the names. If there is consensus here then I'm happy to put this in the hopper for the next release.