insightsengineering / teal.data

Data model for teal applications
https://insightsengineering.github.io/teal.data/
Other
8 stars 7 forks source link

238 assign function with a variable as an input #253

Closed m7pr closed 8 months ago

m7pr commented 8 months ago

Close #238

Overview

get_code now properly detects dependencies in static code analysis if in assign the x parameter was passed as a variable.

How it was before

code <- c("z <- 'b'", "assign(z, 5)")
tdata <- eval_code(teal_data(), code)
get_code(tdata, datanames = "b")

> character(0)

How it is now

code <- c("z <- 'b'", "assign(z, 5)")
tdata <- eval_code(teal_data(), code)
get_code(tdata, datanames = "b")

> [1] "z <- \"b\"\nassign(z, 5)"

Notes

To allow detection of variables in assign function we do needed to actually evaluate the code. This might not always be possible (due to some locale specifics or lacking credentials) or can take a long time. If you see any other way to do that without evaluating the code, I would appreciate any ideas. From the other hand, we evaluate the code because the variable passed into assign(x = could be a result of multiple text operations on it. If this is only text operations then it could be fast, however it also runs all other code lines that can refer to long data processing, or some authorization.

Questions

Question 1

Should we remove the check in get_code_dependency() that detects if passed names correspond to the existing object names within the code? This is already a long block of lines, which is incomplete as it does not YET cover the case of variable being passed to assign(x = function (x parameter).

code <- c("z <- 'b'", "assign(z, 5)")
tdata <- eval_code(teal_data(), code)
get_code(tdata, datanames = "b")

> [1] "z <- \"b\"\nassign(z, 5)"
In get_code_dependency(object@code, datanames) :
  Object(s) not found in code: b

https://github.com/insightsengineering/teal.data/blob/655ea9edf4d5c0cf4dc821105da8ca3051f6fd07/R/utils-get_code_dependency.R#L40-L52

Question 2

Since we are doing so much with assign detection in extract_occurrence I wonder if we should create a separate function for assign evaluation, to make extract_occurrence easier to grasp. This could contain below lines https://github.com/insightsengineering/teal.data/blob/655ea9edf4d5c0cf4dc821105da8ca3051f6fd07/R/utils-get_code_dependency.R#L206-L231

github-actions[bot] commented 8 months ago

Unit Test Performance Difference

Additional test case details | Test Suite | $Status$ | Time on `main` | $±Time$ | Test Case | |:-----|:----:|:----:|:----:|:-----| | get_code | 👶 | | $+0.01$ | get_code_does_not_break_if_code_is_separated_by_ |

Results for commit 432a274fd3f7ed672231c979f0ec3c1a7562da37

♻️ This comment has been updated with latest results.

github-actions[bot] commented 8 months ago

Unit Tests Summary

  1 files   14 suites   1s :stopwatch: 176 tests 175 :white_check_mark: 1 :zzz: 0 :x: 244 runs  243 :white_check_mark: 1 :zzz: 0 :x:

Results for commit a3901c58.

:recycle: This comment has been updated with latest results.

github-actions[bot] commented 8 months ago

badge

Code Coverage Summary

Filename                         Stmts    Miss  Cover    Missing
-----------------------------  -------  ------  -------  ------------------
R/cdisc_data.R                       1       1  0.00%    38
R/default_cdisc_join_keys.R         11      11  0.00%    16-34
R/deprecated.R                      57      57  0.00%    19-344
R/dummy_function.R                   2       2  0.00%    14-15
R/formatters_var_labels.R           36      11  69.44%   60, 69-80
R/join_key.R                        38       0  100.00%
R/join_keys-c.R                     12       0  100.00%
R/join_keys-extract.R              128       0  100.00%
R/join_keys-names.R                 15       0  100.00%
R/join_keys-parents.R               30       0  100.00%
R/join_keys-print.R                 45       0  100.00%
R/join_keys-utils.R                 73       3  95.89%   35-38
R/join_keys.R                       21       0  100.00%
R/teal_data-class.R                 25       1  96.00%   69
R/teal_data-datanames.R             10       0  100.00%
R/teal_data-get_code.R              14       0  100.00%
R/teal_data-show.R                   4       4  0.00%    14-19
R/teal_data.R                       22       9  59.09%   31, 40-46, 49
R/testhat-helpers.R                 26       0  100.00%
R/topological_sort.R                32       0  100.00%
R/utils-get_code_dependency.R      140       0  100.00%
R/verify.R                          42      11  73.81%   63, 93-97, 100-104
R/zzz.R                             10      10  0.00%    4-16
TOTAL                              794     120  84.89%

Diff against main

Filename                         Stmts    Miss  Cover
-----------------------------  -------  ------  --------
R/utils-get_code_dependency.R      +13       0  +100.00%
TOTAL                              +13       0  +0.25%

Results for commit: a3901c582304caf7653ca8d53d9762e7421831b6

Minimum allowed coverage is 80%

:recycle: This comment has been updated with latest results

m7pr commented 8 months ago

After an internal call with @gogonzo and @chlebowa we decided to drop this feature for now. The current solution that is proposed in this PR requires evaluation of the code which can be a) long, b) not executable (due to e.g. lack of credentials or different environment specifications). User can always use @linktso tag in the code, to extract such lines with assign function where you pass an input as a variable.