insightsengineering / teal.data

Data model for teal applications
https://insightsengineering.github.io/teal.data/
Other
8 stars 7 forks source link

[Question]: Should we have a separate vignette or at least a GitHub issue explaining implementation of `get_code_dependency` #278

Open m7pr opened 7 months ago

m7pr commented 7 months ago

get_code_dependency() aims to perform static data analysis of the code and tries to extract the code limited for a reproduction of a specific object (passed via datanames = parameter).

It uses the outcome of utils::getParsedData() and is based on the dependency structure included in this useful function. It then reshapes the outcome of utils::getParsedData() into a specific graph (code_graph() function) of relations which can be used at the end to restore code for one or more objects (graph_parser() function).

We also tackle edge cases like automatic detection of library() calls, assigning objects with data() and assign() function, or calling assign operators like functions `<-`(y,x).

The implementation and logic of get_code_dependency() has been explained in the PR introducing this feature https://github.com/insightsengineering/teal.data/pull/201, but maybe we should have a separate document that is more accessible and explains the details so that somehow will be able to extend the implementation in 6 or 12 months? Especially since a lot of assumptions were changed, and a lot of new features were covered.

m7pr commented 7 months ago

CC @donyunardi @pawelru

pawelru commented 7 months ago

Hmmm... I think I will hold off from making a strong statement here. I don't know this functionality and its use case that well.

My 2c in the discussion right now is that typically vignettes are covering a grey area of documenting usage of multiple funs in conjunction. If we are talking about a single functionality - maybe it's better to extend its documentation instead?

m7pr commented 7 months ago

But I wanted to summarize implementation details so it's straightforward (or at least easier) to understand what which internal function does in the whole Parsing procedure. So that developers can then find their way in the code

chlebowa commented 7 months ago

I support having a detailed explanation because navigating the code is very hard in this case, function docs notwithstanding. But vignettes are user-facing, so this is not the way to go about it. Maybe a special .Rmd that would live in inst?

m7pr commented 7 months ago

Or an .Rmd in dev/

chlebowa commented 7 months ago

You mean a non-standard directory?

The sources of an R package consist of a subdirectory containing the files DESCRIPTION and NAMESPACE, and the subdirectories R, data, demo, exec, inst, man, po, src, tests, tools and vignettes (some of which can be missing, but which should not be empty). The package subdirectory may also contain files INDEX, configure, cleanup, LICENSE, LICENCE and NEWS. Other files such as INSTALL (for non-standard installation instructions), README/README.md2, or ChangeLog will be ignored by R, but may be useful to end users. The utility R CMD build may add files in a build directory (but this should not be used for other purposes).

m7pr commented 7 months ago

Yes, typically stuff for developers or stuff that is under development is moved to dev/ folder. This folder is then discarded in .Rbuildignore so that it never gets into the build package source during the build of the package.

For example dplyr has an archive folder that lives on GitHub but is eliminated in .Rbuildignore https://github.com/tidyverse/dplyr/blob/main/.Rbuildignore#L34 hence it's not included in the build and installed package.