Open gogonzo opened 8 months ago
Questions:
library
calls for all packages that are attached at the end of the module or all packages that are loaded? I would go with the former.teal
is most likely to be called before teal_data
or teal_data_module
is created and therefore before there is anything to do things within it. I take it library(teal)
should be added on top of the @code
slot?Do we want to add library calls for all packages that are attached at the end of the module or all packages that are loaded? I would go with the former.
I think for all attached. Even though there is ::
in the code of the module, this does not guarantee packages are installed on the end user who asked for the reproducible code. We can also extend the list of returned libraires by calls that install packages for you, but I think we are considering having renv and .lock files?
Perhaps we should still see what namespaces are loaded and raise a warning/error if they are not installed in the environment where the analysis is being reproduced?
Isn't such warning gonna be presented on an app start? If someone created a code on one environment, but then releases it somewhere else, (s)he will see that he is not able to run the app, or one of the modules will fail?
teal is most likely to be called before teal_data or teal_data_module is created and therefore before there is anything to do things within it. I take it library(teal) should be added on top of the @code slot?
Is teal
needed to reproduce the analysis? If we have a regular analysis, I think dplyr would be suffiecient. If we have any module that is attached, then it will be returned (like in i. - first point) and module will be dependend on teal, hence teal also will be loaded in the moment you run the code returned for the reproducibility?
Do we want to add library calls for all packages that are attached at the end of the module or all packages that are loaded? I would go with the former.
I think for all attached. Even though there is
::
in the code of the module, this does not guarantee packages are installed on the end user who asked for the reproducible code. We can also extend the list of returned libraires by calls that install packages for you, but I think we are considering having renv and .lock files?
I don't think auto-installing is a good idea.
Perhaps we should still see what namespaces are loaded and raise a warning/error if they are not installed in the environment where the analysis is being reproduced?
Isn't such warning gonna be presented on an app start? If someone created a code on one environment, but then releases it somewhere else, (s)he will see that he is not able to run the app, or one of the modules will fail?
The reproducible code will not be run in an app, it will be taken from the report and run wherever.
teal is most likely to be called before teal_data or teal_data_module is created and therefore before there is anything to do things within it. I take it library(teal) should be added on top of the @code slot?
Is
teal
needed to reproduce the analysis? If we have a regular analysis, I think dplyr would be suffiecient. If we have any module that is attached, then it will be returned (like in i. - first point) and module will be dependend on teal, hence teal also will be loaded in the moment you run the code returned for the reproducibility?
Good point.
Ok, so it's either a decision if we just leave a warning that some packages are missing, or we allow them to be installed, but in both cases we do put a code that checks if needed packages are installed.
I'm fine setting up a note, on which packages are missing and pasting a message to be copy-pasted on how to install them. How does that sound?
I think it's reasonable.
On the main topic I don't see a reason why we do not allow to specify libraries in teal_data() |> within()
block, as sometimes data processing requires extra packages. And I don't think we should require app developer or any body to understand which libraries are used in teal
internals. To sum up, I think we should allow to include libraries in teal_data() |> within()
block and also return libraries attached during teal app modules execution.
We do allow that, in fact we recommend it. We modified ALL examples to show that just this month.
Alrighty, so what's left in here to decide (and potentially implement)?
within
on the preprocessingget_rcode_libraries
- ā we allow a user to specify libraries in
within
on the preprocessing- ā teal appends libraries with
get_rcode_libraries
If these both hold true, there will be duplicates. get_rcode_libraries
should be modified or abandoned altogether.
I am for putting the responsibility for library
calls on the app dev.
The possibility of modules modifying the search path should be addressed.
I still believe we should check for loaded namespaces and warn if they are not installed. Does anyone have any thoughts on this?
Currently there is little possibility for duplicate library(...)
calls due to https://github.com/insightsengineering/teal.data/issues/220, but once that is resolved I'm on the fence on the solution.
On one hand I'd like to leave this responsibility with the app developer. It's just a matter of moving the library around. However, we would need to change the current inheritance to prevent inheritance of parent.env(.GlobalEnv)
.
Right?
On the other hand, we can also remove duplicates library calls that exist on get_datasets_code
and get_rcode_libraries
(removing from the first as that'll be the subsquent calls).
This would require some regex magic, but it would be doable.
I still believe we should check for loaded namespaces and warn if they are not installed. Does anyone have any thoughts on this?
Agree and on it!! Otherwise it may end being frustrating to the user.
The duplicates are caused by get_rcode_libraries
. The function will be removed and attaching packages will be up to the app dev.
The problem is get_rcode_libraries
also adds library
calls for loaded packages and that has to be handled differently.
EDIT: I the app dev decides to attach a package more than once, that's on them.
we can also remove duplicates library calls that exist on
get_datasets_code
andget_rcode_libraries
(removing from the first as that'll be the subsquent calls)
I would say the opposite: keep the former, remove the latter. The purpose is to recreate the search path and packages that are on the path are not attached again.
So the end goal here is to stop using get_rcode_libraries
alltogether? That would remove those pesky {teal}-family includes that won't be relevant for reproducibility
I would say the opposite: keep the former, remove the latter. The purpose is to recreate the search path and packages that are on the path are not attached again.
Sure thing, I was considering keeping first occurence while trying to group all libraries together, but filtering out from there will be fine
So the end goal here is to stop using
get_rcode_libraries
alltogether? That would remove those pesky {teal}-family includes that won't be relevant for reproducibility
I believe so. As far as I know (this was before my time), get_rcode_libraries
was introduced to create a semblance of a reproducible environment because - unlike with within
- library
calls were not explicitly included in preprocessing code.
TODO: create a snippet that is added before library calls that verifies if all libraries are installed on the system that execute the code. if some are missing, create a string to be copy-pasted to install missing libraries.
This very much doable, but we will end up with {teal}.* and such in the "required" packages to have installed.
By "and such" it will have a bunch of weird dependencies not related with reproducible code (such as fontawesome
)
AFAIK it's not possible to get the namespaces attached/loaded to a given environment. One way would be to track differences in loadedNamespaces in eval_code
/ within
calls, but as shown below (š ) it has caveats.
Q: Do you know of any way of doing this?
TODO: create a snippet that is added before library calls that verifies if all libraries are installed on the system that execute the code. if some are missing, create a string to be copy-pasted to install missing libraries.
This subject was broached in a different issue. We do not want to automate installation. The usual Error in library(pkg) : there is no package called āpkgā
is quite enough IMO.
Nonono, There's no installation automation, if you look closely at the screenshot it's installED.packages
, so the snippets checks if they are installed. and outputs an error message
At the end is the snippet that I was using. We could use a library
call for each one alternatively.
Calling library for each item in loadedNamespaces()
will generate at least 27 lines (with {ggplot2} it grows to 42 and so forth)
# ...
missing_code <- substitute(
missing <- Filter(function(.x) ! .x %in% installed.packages()[, "Package"], pkgs),
list(pkgs = sort(
Filter(
function(.x) .x %in% installed.packages(priority = NA_character_)[, "Package"],
loadedNamespaces()
)
))
)
warning_code <- quote(
if (length(missing))
stop(paste(
"Some of the libraries needed to reproduce the results are not installed:",
paste0(" Please use: install.packages(c(", paste0("'", missing, "'", collapse = ", "), "))"),
sep = "\n"
))
)
# ...
Error in eval(warning_code) :
Some of the libraries needed to reproduce the results are not installed:
Please use: install.packages(c('pkg1', 'pkg2'))
I know the issue has been abandoned for some time, but maybe work on this one can also bring a solution to this one https://github.com/insightsengineering/teal/issues/593
In the preprocessing app developer can add
library(pkg)
calls in theeval_code/within
call. Thanks to this code inqenv/teal_data
will containlibrary
calls, so will be more reproducible.However, when sending data to the module another list of libraries is appended. We need to decide about who should be responsible for listing
libraries
used in a code (app developer) or (teal).::
teal
can add all the libraries according to what is loaded/attached in a sessionWe need to find robust way to do manage this.