metrumresearchgroup / bbr

R interface for model and project management
https://metrumresearchgroup.github.io/bbr/
Other
22 stars 2 forks source link

Feature request: make `get_data_path` also read in the path from the ctl #651

Closed barrettk closed 4 months ago

barrettk commented 5 months ago

As seen in the helper docs, get_data_path:

only works on models that have been successfully run because the path to the data is constructed during model submission.

I propose an amendment to this function, though I am not proposing we replace the existing functionality with the one below (more on this in a second).

We can get the data path from the ctl file directly via nmrec:

get_data_path_from_ctl <- function(.mod){
  mod_path <- get_model_path(.mod)
  ctl <- nmrec::read_ctl(mod_path)
  data_rec <- nmrec::select_records(ctl, "data")[[1]]
  data_path <- nmrec::get_record_option(data_rec, "filename")$value

  data_path_norm <- fs::path_norm(file.path(mod_path, data_path))

  if(!fs::file_exists(data_path_norm)){
    stop(glue("Could not find data at {data_path_norm}"))
  }

  return(data_path_norm)
}

I propose we attempt to fetch both data paths. i.e. we extract the data path from the ctl file, and if the model has been run, we also parse it from the bbr model object. We could compare the two, and warn the user if there is a mismatch (i.e. if the model file has been moved since it was run). We could have an argument for which one to return (with a default), to ensure a single string is usually returned.

Im not fully on board with this suggestion, but I think this could have a use case. The main thing I would change, is that we provide a method for not erroring out if the model has not been executed. If the model hasn't been executed, we could return the path defined in the control stream file. cc @seth127

seth127 commented 5 months ago

I like this idea, and I've been mulling over something similar since filing https://github.com/metrumresearchgroup/bbr/issues/595 last year. Basically, it seems like we should have an easy way to get the path to the data (and possibly load the data) for a model that hasn't been run yet. nmrec provides an obvious solution for this.

In terms of checking it against what's in bbi_config.json (for an already run model), that probably requires a bit more discussion. It might be "scope creep" for a function that's just supposed to return a file path, but if there's a compelling use case or reason, I'm open to it.

Thanks for filing this.

seth127 commented 4 months ago

Specification

Capturing some offline discussion of the changes we want to make to get_data_path()

kyleam commented 4 months ago

[ Some remarks from a quick read of the comment above ]

Add .check_exists argument [...] (should default to TRUE)

Based on a quick test, that's a change in behavior from the current code, so FALSE seems like the safer/non-breaking default. Perhaps still not a strong enough reason to go that way in this case, but I think it's worth explicit mention.

Add documentation to ?get_data_path to describe .ctl vs .mod discrepancy

Nice, glad to see this documented.

  • Pull data path from model file instead of bbi_config.json

I don't see it mentioned in any of the points below, but this handling should account for the fact that the file name in the control stream may be quoted.

Also, I'm not sure, but I think this is proposing to always look at the data path specified in the control stream, even if bbi_config.json is there for us to grab from. In my view it'd make sense to stick with the existing tried-and-true approach as the preferred method. I don't see the reason for the redundant processing and checks if the existing method returns something.

We may want to make this an optional argument... consider and discuss further...

My initial reaction is that guarding this new behavior behind an option (if not putting it in its own function) is the safe way to go.