Closed LucieContamin closed 7 months ago
Thanks for the report @LucieContamin ! This is a bit of a tricky issue related to exporting std_colnames
in c9446aa1bc37f0e358efaf800faefcaaa9ef77e8 & 717bb2673d0c34f309aec41f0c1927f80af04661.
I've added a quick fix that I'm not satisfied with for now in 73979b7 but will add some additional context and possible more longer term solution proposals in this issue shortly.
So this issue is related to something I find quite unsatisfying at the moment about moving package data from internal to exported.
Initially std_colnames
was an internal object stored in sysdata.rds
which was always available to internal functions. Once it was exported however in response to #88, to be available, even internally, the package needs to be loaded or the object itself using data(std_colnames)
. Both these options are undesirable.
The options available are all somewhat inelegant and no consensus on what the proper way to handle this seems to exist:
std_colnames
in sysdata.rds
(current quick fix). Easy to implement but the duplication is extremely unsatisfactory and brittle.hubUtils::std_colnames
in hubUtils
code to access the object. Goes against all conventions of using objects from the same namespace in other objects. I think it might even trigger a note in R CMD CHECK
.data(std_colnames)
within hubUtils functions or onLoad... Nope!An alternative and more elegant option in my view, which could also form the basis of a localisation framework is discussed in #92 and involves using an exported function to load the internal data object. (See #92 for details).
If we do proceed with this, it is probably best if we have resolved first https://github.com/Infectious-Disease-Modeling-Hubs/schemas/issues/60
@elray1 & @nickreich any thoughts on the above, especially the suggestion to move forward with #92?
I don't fully understand why it is not desirable to load the std_colnames
object using a command like:
data(std_colnames, package="hubUtils")
Why is it worse to load this data directly via data()
than by another call to hubUtils::load_colnames()
or something like that?
The proposal in #92 sounds good to me, and since it seems to be what we might want eventually for language localization anyways, I don't see a reason not to go for it.
That said, I may not be understanding the problems with the current solution, but it doesn't seem so bad to me. Is the main reason for "brittleness" the possibility that the duplicate data objects might get out of sync? It seems like this could be easily avoided by creating/maintaining them in the same place?
First, I should probably clarify the proposal a bit.
std_colnames_
will be included as internal data so will always be available internally to functions, even when the package is not loaded. We then also have an exported function, std_colnames()
which exposes the internal object to users. This function can have arguments, for additional functionality described below.
The reason to avoid using data()
within a package is because the solution is quite hacky. Overall exported data are designed to be available as auxiliary data (mainly for demos) on demand, when the user explicitly asks for them to be loaded through data()
. Using data()
to load exported data into the same package's internal environment is quite a convoluted way of achieving what internal data are designed for.
onLoad
command to do call data()
. We also need to specify the environment it will be loaded in as the default behaviour of data()
is to load into the global environment which we do not want.There's some useful discussion of these points in this post: https://stackoverflow.com/questions/24354850/load-data-object-when-package-is-loaded
In summary, it's over engineering to achieve what internal data are designed to.
Using a function to expose an internal object to users is in my view more elegant because there's no hackiness involved and we can also introduce arguments to enable easy subsetting of the object (which is shown in #92 and will likely be a very common use case). Finally, it already forms the basis of a proposal for localisation functionality so being a solution to multiple problems also makes it more elegant in my view.
Sorry just seen your comment @elray1 !
Yes the brittleness refers to the fact that they may become out of synch. The solution you propose would definitely help and was what I was considering if we stick to this quick fix.
The other thing I don't like is that it's adding two objects to the total size of the package. Sure they are tiny but I don't love the duplication or precedent it sets.
Thanks for the more detailed explanation. I am on board with spending the time to do this the "right way" and solving #92 in the process.
Great! I won't prioritise this for this/next week as want to keep going with validations but will get it done in the next couple of weeks.
Should give us some time to make a decision on this too: Infectious-Disease-Modeling-Hubs/schemas#60
The function
hubUtils::as_model_out_tbl()
seems to require to have the completehubUtils
package load in the environment to work.For example, with
df
being a tibble of model output data:Should we change that behavior to have the function working without having to load the complete package?