Closed barrettk closed 4 months ago
I like this idea, and I've been mulling over something similar since filing https://github.com/metrumresearchgroup/bbr/issues/595 last year. Basically, it seems like we should have an easy way to get the path to the data (and possibly load the data) for a model that hasn't been run yet. nmrec provides an obvious solution for this.
In terms of checking it against what's in bbi_config.json (for an already run model), that probably requires a bit more discussion. It might be "scope creep" for a function that's just supposed to return a file path, but if there's a compelling use case or reason, I'm open to it.
Thanks for filing this.
Capturing some offline discussion of the changes we want to make to get_data_path()
get_data_path()
should return absolute path to the input data file (as it currently does)
get_path_from_object()
helpers, though this could be a separate PR.check_exists
argument, similar to other get_path_from_object()
helpers (should default to TRUE
).ctl
vs. .mod
discrepancy?get_data_path
to describe .ctl
vs .mod
discrepancybbi_config.json
(addresses https://github.com/metrumresearchgroup/bbr/issues/595)
bbi_config.json
bbi_config.json
does exist, check the path and warn if they don't matchcheck_up_to_date()
which needs to pull the data path and MD5 from the bbi_config.json
to compare..._[ Some remarks from a quick read of the comment above ]
Add
.check_exists
argument [...] (should default toTRUE
)
Based on a quick test, that's a change in behavior from the current code, so FALSE
seems like the safer/non-breaking default. Perhaps still not a strong enough reason to go that way in this case, but I think it's worth explicit mention.
Add documentation to
?get_data_path
to describe.ctl
vs.mod
discrepancy
Nice, glad to see this documented.
- Pull data path from model file instead of
bbi_config.json
I don't see it mentioned in any of the points below, but this handling should account for the fact that the file name in the control stream may be quoted.
Also, I'm not sure, but I think this is proposing to always look at the data path specified in the control stream, even if bbi_config.json
is there for us to grab from. In my view it'd make sense to stick with the existing tried-and-true approach as the preferred method. I don't see the reason for the redundant processing and checks if the existing method returns something.
We may want to make this an optional argument... consider and discuss further...
My initial reaction is that guarding this new behavior behind an option (if not putting it in its own function) is the safe way to go.
As seen in the helper docs,
get_data_path
:I propose an amendment to this function, though I am not proposing we replace the existing functionality with the one below (more on this in a second).
We can get the data path from the
ctl
file directly vianmrec
:I propose we attempt to fetch both data paths. i.e. we extract the data path from the
ctl
file, and if the model has been run, we also parse it from thebbr
model object. We could compare the two, and warn the user if there is a mismatch (i.e. if the model file has been moved since it was run). We could have an argument for which one to return (with a default), to ensure a single string is usually returned.Im not fully on board with this suggestion, but I think this could have a use case. The main thing I would change, is that we provide a method for not erroring out if the model has not been executed. If the model hasn't been executed, we could return the path defined in the control stream file. cc @seth127