insightsengineering / verdepcheck

An R package that tests your R package against the min/max versions of specified dependencies
https://insightsengineering.github.io/verdepcheck/
Other
6 stars 0 forks source link

Fixes and improvements for strategies #26

Closed averissimo closed 1 year ago

averissimo commented 1 year ago

ℹ️ base branch is training, not main

See https://github.com/insightsengineering/nestdevs-tasks/issues/7 for overall progress

Notable changes:

How to reproduce via docker container

(assumes a valid bash and docker/podman is installed)

mkdir -p verdepcheck-sandbox/repos
cd verdepcheck-sandbox

# Checkout action
git clone -b new-strategies --depth 1 https://github.com/insightsengineering/r-verdepcheck-action

# Download bash action and `script.R`
curl -o action.bash https://gist.githubusercontent.com/averissimo/17229e87f698626ef8d2463ee80f58a7/raw/c763c2b77d078be2e14d39252f91a5ed61054de7/r-verdepcheck-action.bash
git clone -b verdepcheck_action --depth 1 https://github.com/insightsengineering/random.cdisc.data repos/random.cdisc.data

touch .Renviron # (optional) place `GITHUB_PAT=<your_github_pat>` in the file, although it's not needed
mkdir libs # cache for faster runs after first

# Run docker container
docker run -v ./r-verdepcheck-action:/workspace/r-verdepcheck-action -v ./repos:/workspace/repos -v ./action.bash:/workspace/action.bash -v ./libs:/workspace/libs -v ./.Renviron:/root/.Renviron ghcr.io/insightsengineering/rstudio_4.3.1_bioc_3.17:latest bash ./action.bash -p repos/random.cdisc.data -s min_isolated

Screenshots

image

averissimo commented 1 year ago

BLUF: When searching for release dates for non-cran refs such as teal (>= 0.10.0) it returns multiple dates (from GraphQL GH query) and a strategy to select for max date should be safe. Filtering for the actual tag doesn't seem possible/easy.

Action taken: Select for max date (it should be safe, as long as the version is specific v.#.#.# with no further "sub-versions" as v0.10 with return all tags with this prefix v0.10.0-rc@pre-release, v0.10.0 and v0.10.1)

Context: GraphQL GH query returns the 2 tags v0.10.0 and v0.10.0-rc@pre-release without any tag information.

note: Teal (>= 0.10.0) is just a public example that replicates a concrete use case.

The reproducible example below shows the problem. If we change the query for an empty string we can see that some tagged commits have Tag information and other don't (as far I know it's arbitrary with 6 out of 35 having the Tag metadata)

{
    repository(owner: "insightsengineering", name: "teal") {
      refs(refPrefix: "refs/tags/", query: "v0.10.0", first: 100) {
        nodes {
          target {
            ... on Commit {
              committedDate
              abbreviatedOid
            }
            ... on Tag {
              abbreviatedOid
              commitResourcePath
              commitUrl
              message
              name
              oid
            }
          }
        }
      }
    }
  }

Teal result 1 (query: "v0.10.0")

json output ```json { "data": { "repository": { "refs": { "nodes": [ { "target": { "committedDate": "2021-10-08T15:10:35Z" } }, { "target": { "committedDate": "2021-10-08T15:10:35Z" } } ] } } } } ```

Teal result 2 (query: "" -- empty)

json output ```json { "data": { "repository": { "refs": { "nodes": [ { "target": { "committedDate": "2021-07-01T13:15:34Z" } }, { "target": { "committedDate": "2021-10-08T15:10:35Z" } }, { "target": { "committedDate": "2022-01-27T13:02:30Z" } }, { "target": { "committedDate": "2022-04-07T17:37:33Z" } }, { "target": { "committedDate": "2022-06-09T13:07:34Z" } }, { "target": { "committedDate": "2022-07-29T12:16:11Z" } }, { "target": { "committedDate": "2022-10-14T01:06:31Z" } }, { "target": { "abbreviatedOid": "972e2b3", "commitResourcePath": "/insightsengineering/teal/commit/3551e2c36d05f4d569a8385505e6ec367d0d2ef3", "commitUrl": "https://github.com/insightsengineering/teal/commit/3551e2c36d05f4d569a8385505e6ec367d0d2ef3", "message": "inital version of teal\n", "name": "v0.0.1", "oid": "972e2b3673a1c8cc0414894ba0e67faa7e37e5e9" } }, { "target": { "abbreviatedOid": "a250362", "commitResourcePath": "/insightsengineering/teal/commit/60c961856a446e310b37d2eb002c21a972d2aee4", "commitUrl": "https://github.com/insightsengineering/teal/commit/60c961856a446e310b37d2eb002c21a972d2aee4", "message": "development version with filtering bug fix\n", "name": "v0.0.1.1", "oid": "a250362d42cc727b294d62da7fcecab8b52e7385" } }, { "target": { "abbreviatedOid": "7315062", "commitResourcePath": "/insightsengineering/teal/commit/4504df76f8c8b1ca841678a1bcd4c84e2a0e7b46", "commitUrl": "https://github.com/insightsengineering/teal/commit/4504df76f8c8b1ca841678a1bcd4c84e2a0e7b46", "message": "version 0.0.2 that works with the new kadcyla app\n", "name": "v0.0.2", "oid": "73150626634502d3209445892ea995e51fe1d868" } }, { "target": { "abbreviatedOid": "8fcae40", "commitResourcePath": "/insightsengineering/teal/commit/35557f910f6d43baa9b381ee7e7b0735967b1f0e", "commitUrl": "https://github.com/insightsengineering/teal/commit/35557f910f6d43baa9b381ee7e7b0735967b1f0e", "message": "now with hold_filtering and continue filtering\n", "name": "v0.0.2.1", "oid": "8fcae40701857fe2aba43ca9e4018a5ff6358bf6" } }, { "target": { "abbreviatedOid": "cfcbe2d", "commitResourcePath": "/insightsengineering/teal/commit/080ef53aa67b402ab0ffec8798e7d05e76a934b0", "commitUrl": "https://github.com/insightsengineering/teal/commit/080ef53aa67b402ab0ffec8798e7d05e76a934b0", "message": "Version 0.0.3\n", "name": "v0.0.3", "oid": "cfcbe2d37732457842eac812dcc082bd2cff4e8e" } }, { "target": { "committedDate": "2018-06-09T19:16:33Z" } }, { "target": { "committedDate": "2018-11-11T22:02:28Z" } }, { "target": { "committedDate": "2019-02-25T21:00:05Z" } }, { "target": { "committedDate": "2019-07-14T18:57:14Z" } }, { "target": { "committedDate": "2019-10-11T15:40:46Z" } }, { "target": { "committedDate": "2019-10-18T13:03:44Z" } }, { "target": { "committedDate": "2019-12-30T11:36:31Z" } }, { "target": { "committedDate": "2020-04-10T10:08:54Z" } }, { "target": { "committedDate": "2020-06-15T09:24:54Z" } }, { "target": { "committedDate": "2020-07-07T15:31:08Z" } }, { "target": { "abbreviatedOid": "75dc075", "commitResourcePath": "/insightsengineering/teal/commit/eb2368d7e2a70e13fb968b0f6dc462811381ac13", "commitUrl": "https://github.com/insightsengineering/teal/commit/eb2368d7e2a70e13fb968b0f6dc462811381ac13", "message": "* `cdisc_dataset` and `dataset` now return R6 class objects (`RelationalDataset`). * A new `teal_data` function to include datasets and connectors into teal application. * `cdisc_data` function to include datasets and connectors into teal application where a `check` argument still could be used and other consistency tests are performed. * `get_raw_data` can be used to derive raw data from R6 objects e.g. (`RelationalDataset`). * `RawDatasetConnector`, `NamedDatasetConnector` and `RelationalDatasetConnector` to execute custom function call in order to get data from connection. * `CodeClass` to manage reproducibility of the data and relationships between datasets. Not directly exposed to the public interface. * `mutate_dataset` allows to modify dataset or connector via `code` argument or an R script. * `mutate_data` allows to change any dataset in `RelationalData`, `RelationalDataConnector` or `RelationalDataList`. * New wrapper functions to manipulate `RelationalDatasetConnector` and `RelationalDataset` such as `get_dataset`, `load_dataset`, `as_relational`. * New wrapper functions to manipulate `RelationalDataConnector`, `RelationalData` and `RelationalDataList` such as `get_datasets`, `load_datasets`. * `choices_labeled`, `filter_spec`, `select_spec`, `data_extract_spec`, `value_choices`, `variable_choices` as S3 class applied on `data.frame` and also on delayed data. * You can no longer modify the `app$datasets`, but must instead use argument `filter` in the `init` function. * New modules were created to create a module of nested teal modules, then another one that adds the right filter pane to each tab. The `teal::init` function stays unchanged. * The `teal::init` function now returns a `UI` function with an optional `id` argument. This allows to embed it into other applications. A split view of two teal applications side-by-side is one such example and shown in a vignette. `teal::init` was turned into a wrapper function around `module_teal_with_splash.R` and developers that want to embed teal as a Shiny module should directly work with these functions (`ui_teal_with_splash` and `srv_teal_with_splash`) instead of `teal::init`. * The `teal::init` function now has a title parameter to set the title of the browser window.\t * Missing data `NA` is now explicitly addressed in the filter panel: `NA`s are excluded by default and a checkbox to include them was added. * Statistics of the data are visually depicted in terms of histograms or bar charts overlayed onto the Shiny input elements. * Added buttons to remove all filters applied to a dataset. * Restored the functionality to hide the filter panel for a module when it was constructed with `filters = NULL`. * Moved helper functions into `utils.nest` and removed unused functions `set_labels_df` and `get_labels_df`. * `optionalSelectInput` now allows for grouped choices. ## Refactor of FilteredData (for developers) * `FilteredData` is now fully reactive. Now filtered data is lazy evaluated as per need. This further opens the door to bookmarking `teal` apps (bookmarking currently works for the right filtering panel, but we will make this feature more sophisticated in a future release, each module must be reviewed and adapted if it contains `reactiveValues`). * Datasets and materialized connectors are provided to `FilteredData` by `set_datasets_data` function located in `init_datasets.R` file. * Renamed `get_dataset()` method to `get_data()`. * Renamed `get_filter_call()` method to `get_filter_expr()`; returns an expression rather than a list. * Removed argument `isolate` from `get_data()` method and similar methods. You must `isolate` it yourself as needed. If you want to temporarily deactivate Shiny errors due to missing errors, you can set `options(shiny.suppressMissingContextError = TRUE)`. In general, avoid `isolate` as this breaks reactivity. * We added a development module to add several filters at once, e.g. safety filters. This is to be evaluated before it is converted into a proper module and made available to end-users.", "name": "v0.9.0", "oid": "75dc075b4c396891cbfa0314ac7fffb62476b33d" } }, { "target": { "committedDate": "2020-10-19T12:12:30Z" } }, { "target": { "committedDate": "2020-12-15T13:55:22Z" } }, { "target": { "committedDate": "2021-03-18T17:49:08Z" } }, { "target": { "committedDate": "2021-05-03T07:47:36Z" } }, { "target": { "committedDate": "2021-07-01T13:15:34Z" } }, { "target": { "committedDate": "2021-10-08T15:10:35Z" } }, { "target": { "committedDate": "2021-10-08T15:10:35Z" } }, { "target": { "committedDate": "2022-01-27T13:02:30Z" } }, { "target": { "committedDate": "2022-04-07T17:37:33Z" } }, { "target": { "committedDate": "2022-06-09T13:07:34Z" } }, { "target": { "committedDate": "2022-10-14T01:06:31Z" } }, { "target": { "committedDate": "2023-05-22T18:00:53Z" } } ] } } } } ```
pawelru commented 1 year ago

When searching for release dates for non-cran refs such as teal (>= 0.10.0) it returns multiple dates

Very interesting finding. Checked this on my own and I do confirm.

I looked into the code to see how you addressed it and it seems that you fixed it on the consumer side to handle returned vector of dates. Wouldn't be easier / better to assure that get_release_date returns obj of length of one (in particular: NA for not found or unknown remote-ref class) - that is: handle it on the producer side? As a general note, number of consumers tends to grow over time and each of them would have to have this "fix" which is not so DRY solution.

averissimo commented 1 year ago

Wouldn't be easier / better to assure that get_release_date returns obj of length of one (in particular: NA for not found or unknown remote-ref class) - that is: handle it on the producer side? As a general note, number of consumers tends to grow over time and each of them would have to have this "fix" which is not so DRY solution.

Very good point. I was doing this on the consumer side as it would make the get_ppm_snapshot_by_date() robust to unexpected parameter lengths (as it just happened)

Better to enforce the select strategy on the producer! I'll make that change.

Is it too late to include checkmate and start adding some parameter assertions to better prevent this?

pawelru commented 1 year ago

Is it too late to include checkmate and start adding some parameter assertions to better prevent this?

I've been thinking about that and decided to go more modest way when it comes to deps at least for the beginning until we see a really strong reason to do so. This could be one of the reasons but I don't think it's so strong to enforce it. Let's wait and check if we find more cases where it will provide value.