wlandau commented 3 years ago

Date accepted: 2022-01-12 Submitting Author Name: Will Landau Submitting Author Github Handle: !--author1-->@wlandau@adamhsparks<!--end-editor-- Reviewers: @smwindecker, @mdneuzerling

Due date for @smwindecker: 2021-12-20 Due date for @mdneuzerling: 2021-12-28

Archive: TBD Version accepted: TBD Language: en

Paste the full DESCRIPTION file inside a code block below:

Package: gittargets
Title: Data Version Control for the Targets Package
Description: Version control systems such as Git help researchers
  track changes and history in data science projects,
  and the 'targets' package (2021, <doi:10.21105/joss.02959>)
  minimizes the computational cost of keeping the latest results
  reproducible and up to date. The 'gittargets' package
  combines these two capabilities. The 'targets' data store
  becomes a version control repository and stays synchronized
  with the Git repository of the source code.
  Users can switch commits and branches
  without invalidating the 'targets' pipeline.
Version: 0.0.0.9000
License: MIT + file LICENSE
URL: https://wlandau.github.io/gittargets/, https://github.com/wlandau/gittargets
BugReports: https://github.com/wlandau/gittargets/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  cli (>= 3.0.0),
  data.table (>= 1.12.8),
  gert (>= 1.0.0),
  processx (>= 3.0.0),
  stats,
  targets (>= 0.8.1.9000),
  tibble (>= 3.0.0),
  utils,
  uuid (>= 1.0.0)
Suggests:
  knitr (>= 1.30),
  markdown (>= 1.1),
  rmarkdown (>= 2.4),
  testthat (>= 3.0.0)
Remotes:
  ropensci/targets
SystemRequirements: Git (>= 2.0.0)
Encoding: UTF-8
Language: en-US
VignetteBuilder: knitr
Config/testthat/edition: 3
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.2

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- [ ] data retrieval
- [ ] data extraction
- [ ] data munging
- [ ] data deposition
- [ ] workflow automation
- [x] version control
- [ ] citation management and bibliometrics
- [ ] scientific software wrappers
- [ ] field and lab reproducibility tools
- [ ] database software bindings
- [ ] geospatial data
- [ ] text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

Version control systems such as Git help researchers track changes and history in data science projects, and the targets package minimizes the computational cost of keeping the latest results reproducible and up to date. The gittargets package combines these two capabilities. The targets data store becomes a version control repository and stays synchronized with the Git repository of the source code. Users can switch commits and branches without invalidating the targets pipeline.

Who is the target audience and what are scientific applications of this package?

gittargets is for people who use targets for reproducible analysis pipelines and Git for tracking the code files of those pipelines.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Packages gh, gert, git2r, and git2rdata all work with Git from R, but in a way that is more general, less opinionated, and less targets-specific than gittargets. gittargets establishes a separate repository for the targets data that is linked to the project's code repository. This allows the code and data to synchronize without affecting the efficiency or the size of the code repository.

(If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?

N/A

If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

N/A

Technical checks

Confirm each of the following by checking the box.

[x] I have read the guide for authors and rOpenSci packaging guide.

This package:

[x] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions, created with roxygen2.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you intend for this package to go on Bioconductor?
[ ] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

- [n/a] The package is novel and will be of interest to the broad readership of the journal. - [n/a] The manuscript describing the package is no longer than 3000 words. - [n/a] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)

Code of conduct

[x] I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

ropensci-review-bot commented 3 years ago

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

ropensci-review-bot commented 3 years ago

:rocket:

Error: Issue template has no 'repourl'

:wave:

ropensci-review-bot commented 3 years ago

Missing values: author1, repourl

wlandau commented 3 years ago

Sorry about that, should be fixed now.

ldecicco-USGS commented 3 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 3 years ago

Thanks, about to send the query.

ropensci-review-bot commented 3 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 3 years ago

Checks for gittargets (v0.0.0.9000)

git hash: b6eadc4b

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_multiplication_x: Package coverage is 13.3% (should be at least 75%).
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Important: All failing checks above must be addressed prior to proceeding

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 1 authors - 1 vignette - no internal data file - 9 imported packages - 9 exported functions (median 23 lines of code) - 94 non-exported functions in R (median 6 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:-----------------------|-----:|----------:|:----------| |files_R | 15| 70.6| | |files_vignettes | 0| 0.0|TRUE | |files_tests | 17| 94.1| | |loc_R | 663| 53.5| | |loc_tests | 417| 68.5| | |num_vignettes | 1| 60.7| | |n_fns_r | 103| 71.7| | |n_fns_r_exported | 9| 38.9| | |n_fns_r_not_exported | 94| 76.9| | |n_fns_per_file_r | 4| 51.7| | |num_params_per_fn | 4| 54.3| | |loc_per_fn_r | 7| 23.4| | |loc_per_fn_r_exp | 23| 54.1| | |loc_per_fn_r_not_exp | 6| 29.1| | |rel_whitespace_R | 6| 24.3| | |rel_whitespace_tests | 7| 76.1| | |doclines_per_fn_exp | 62| 74.1| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 123| 80.4| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![github](https://github.com/wlandau/gittargets/workflows/check/badge.svg)](https://github.com/wlandau/gittargets/actions) [![github](https://github.com/wlandau/gittargets/workflows/lint/badge.svg)](https://github.com/wlandau/gittargets/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:-------|:----------|:------|:----------| |check |success |b6eadc |2021-11-17 | |cover |success |b6eadc |2021-11-17 | |lint |success |b6eadc |2021-11-17 | |pkgdown |success |b6eadc |2021-11-17 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 13.35 The following files are not completely covered by tests: file | coverage --- | --- R/tar_git_checkout.R | 0% R/tar_git_init.R | 0% R/tar_git_log.R | 0% R/tar_git_ok.R | 0% R/tar_git_snapshot.R | 0% R/tar_git_status_code.R | 0% R/tar_git_status_data.R | 0% R/tar_git_status_targets.R | 0% R/tar_git_status.R | 0% R/utils_assert.R | 0% R/utils_git.R | 30.77% #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.2.72 | |pkgcheck |0.0.2.107 |

Editor-in-Chief Instructions:

Processing may not proceed until the items marked with :heavy_multiplication_x: have been resolved.

wlandau commented 3 years ago

gittargets requires command line Git in order for git-lfs to work (related: https://github.com/r-lib/gert/issues/160). The majority of tests will automatically be skipped if tar_git_ok() returns FALSE. Is there still a path forward?

ldecicco-USGS commented 3 years ago

Yeah, that makes sense. I'll get started on getting this moving.

mpadge commented 3 years ago

And I'll check out why coverage might have failed tomorrow, although git is definitely installed. @wlandau Nothing you need to worry about, thanks.

Edit: For the record, @wlandau helped debug there, and the bot checks now give coverage of 100% :rocket: :100:

wlandau commented 3 years ago

Thank you both!

Related: I have not been able to correctly set up Git on the Windows GitHub Actions for gittargets. I tried a bunch of different approaches, but I still get errors claiming I need to configure git config --global user.name and git config --global user.email. The other runners appear to work just fine though, including the one that checks coverage.

ldecicco-USGS commented 3 years ago

@ropensci-review-bot assign @adamhsparksas editor

ldecicco-USGS commented 3 years ago

@ropensci-review-bot assign @adamhsparks as editor

ropensci-review-bot commented 3 years ago

Assigned! @adamhsparks is now the editor

adamhsparks commented 3 years ago

@ropensci-review-bot check package

ropensci-review-bot commented 3 years ago

Thanks, about to send the query.

ropensci-review-bot commented 3 years ago

:rocket:

Editor check started

:wave:

ropensci-review-bot commented 3 years ago

Checks for gittargets (v0.0.0.9000)

git hash: 5c046f1d

:heavy_check_mark: Package name is available
:heavy_check_mark: has a 'CITATION' file.
:heavy_check_mark: has a 'codemeta.json' file.
:heavy_check_mark: has a 'contributing' file.
:heavy_check_mark: uses 'roxygen2'.
:heavy_check_mark: 'DESCRIPTION' has a URL field.
:heavy_check_mark: 'DESCRIPTION' has a BugReports field.
:heavy_check_mark: Package has at least one HTML vignette
:heavy_check_mark: All functions have examples.
:heavy_check_mark: Package has continuous integration checks.
:heavy_check_mark: Package coverage is 100%.
:heavy_check_mark: R CMD check found no errors.
:heavy_check_mark: R CMD check found no warnings.

Package License: MIT + file LICENSE

1. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has: - code in R (100% in 15 files) and - 1 authors - 1 vignette - no internal data file - 9 imported packages - 9 exported functions (median 23 lines of code) - 96 non-exported functions in R (median 6 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 70.6| | |files_vignettes | 1| 64.8| | |files_tests | 17| 94.1| | |loc_R | 669| 53.8| | |loc_vignettes | 143| 62.1| | |loc_tests | 503| 72.2| | |num_vignettes | 1| 60.7| | |n_fns_r | 105| 72.3| | |n_fns_r_exported | 9| 38.9| | |n_fns_r_not_exported | 96| 77.3| | |n_fns_per_file_r | 4| 52.4| | |num_params_per_fn | 4| 54.3| | |loc_per_fn_r | 7| 23.4| | |loc_per_fn_r_exp | 23| 54.1| | |loc_per_fn_r_not_exp | 6| 23.5| | |rel_whitespace_R | 6| 24.9| | |rel_whitespace_vignettes | 43| 80.0| | |rel_whitespace_tests | 7| 78.4| | |doclines_per_fn_exp | 61| 73.3| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 132| 81.5| | ---

1a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package

2. `goodpractice` and other checks

Details of goodpractice and other checks (click to open)

#### 3a. Continuous Integration Badges [![github](https://github.com/wlandau/gittargets/workflows/check/badge.svg)](https://github.com/wlandau/gittargets/actions) [![github](https://github.com/wlandau/gittargets/workflows/lint/badge.svg)](https://github.com/wlandau/gittargets/actions) **GitHub Workflow Results** |name |conclusion |sha |date | |:------------|:----------|:------|:----------| |check |success |5c046f |2021-11-18 | |cover |success |5c046f |2021-11-18 | |cover-ubuntu |success |5c046f |2021-11-18 | |lint |success |5c046f |2021-11-18 | |pkgdown |success |5c046f |2021-11-18 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 100 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found no issues with this package!

Package Versions

|package |version | |:--------|:---------| |pkgstats |0.0.3.51 | |pkgcheck |0.0.2.130 |

Editor-in-Chief Instructions:

This package is in top shape and may be passed on to a handling editor

adamhsparks commented 2 years ago

Hi @wlandau, sorry for the delays. I'll endeavour to complete my checks today and start looking for reviewers so we can at least have some assigned before Christmas.

adamhsparks commented 2 years ago

Editor checks:

[x] Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
- [x] Is the case for the package well made?
- [x] Is the reference index page clear (grouped by topic if necessary)?
- [x] Are vignettes readable, sufficiently detailed and not just perfunctory?
[x] Fit: The package meets criteria for fit and overlap.
[x] Installation instructions: Are installation instructions clear enough for human users?
[x] Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
[x] Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
[x] License: The package has a CRAN or OSI accepted license.
[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Thanks for this submission, Will. There's a few of us that were excited to see this one submitted for review. Everything looks good from here with my editor checks. I don't have any major suggestions so I'll start looking for reviewers now.

wlandau commented 2 years ago

Thanks, @adamhsparks! I am excited too.

adamhsparks commented 2 years ago

@ropensci-review-bot add @smwindecker to reviewers

ropensci-review-bot commented 2 years ago

@smwindecker added to the reviewers list. Review due date is 2021-12-20. Thanks @smwindecker for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@smwindecker: If you haven't done so, please fill this form for us to update our reviewers records.

adamhsparks commented 2 years ago

@ropensci-review-bot add @mdneuzerling to reviewers

ropensci-review-bot commented 2 years ago

@mdneuzerling added to the reviewers list. Review due date is 2021-12-28. Thanks @mdneuzerling for accepting to review! Please refer to our reviewer guide.

ropensci-review-bot commented 2 years ago

@mdneuzerling: If you haven't done so, please fill this form for us to update our reviewers records.

adamhsparks commented 2 years ago

@smwindecker and @mdneuzerling, thank you both for agreeing to review {gittargets} for us. I’m aware that this review and the three weeks may have competing priorities for your time with the holidays and leave at the end of the year/beginning of the new year. With that in mind, please just keep me posted on your progress as you have time. If you get these finished by sometime in January that would be fine, but we’re pretty flexible here and understand that life happens so please do ask if you need more time for any reason.

wlandau commented 2 years ago

Yeah, I definitely don't expect any reviews until 2022. Please take all your time off!

By the way, I am actively using gittargets for a pipeline at work. I am developing a large Bayesian model for clinical data, and I have multiple Git branches to try different versions of the model and data preprocessing. The model takes hours to run, and gittargets lets me switch among multiple up-to-date pipelines when I switch branches.

smwindecker commented 2 years ago

thanks @adamhsparks. My goal was to spend the next few days tinkering so will make a start in any case. I'm using targets for my main work project but still feel a bit junior league on it, particularly related to how best to incorporate the models that are sent to HPC into my targets workflow. Look forward to spending the time on this, but it may be first weeks of Jan.

mdneuzerling commented 2 years ago

Hello @adamhsparks @wlandau. I've made a start on this. Apologies for the delay --- I had to get Christmas out of the way first!

I've started by looking through the code structure to get a rough idea of things. I'm making notes on a pull request, which I'll then compile into a formal review. This is a trick I've used before — check out the first commit of a repo into a separate branch, and then create a pull request onto it.

adamhsparks commented 2 years ago

Thanks for the update @mdneuzerling. No hurry, we’ve still another holiday to go and it is summer here in Oz after all.

mdneuzerling commented 2 years ago

Package Review

I've previously used the author's drake and targets packages, and have had a small amount of personal communication regarding these. I've also published blog posts on these packages.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (if you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 5

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The code in this package is to a very high standard. I have a few comments around documentation that I believe could help a new user approaching this package for the first time. However, these are only suggestions, and I strongly recommend that this project be submitted to CRAN. I have some other comments that are nitpicks and can definitely be ignored.

I created a very simple project to test the usage of this package. My code is minimal, and can be found at mdneuzerling/gittargets-peerreview.

General comments

I really appreciate the frequent use of assertions. This gives me confidence that the assumptions behind the function inputs are adequately tested.
The documentation frequently repeats itself (a good thing). The author makes good use of the various inheritance tags in Roxygen.

Recommendations

I feel like the first paragraph of the README doesn't adequately capture the use case of gittargets. What I'm looking to see in the first few paragraphs is something that covers what the targets data store is, and what you want to have happen to it when you switch branches with git.
In the project README or elsewhere, I recommend describing the intended workflow of the package. I settled on the below when checking out a different branch, which I think is right:
```
# git checkout <branch>
tar_git_checkout() # to bring the data store up-to-date
# make changes
tar_make()
tar_git_snapshot()
```
The assertions in tar_git_log allow for a value of max = Inf, which is a reasonable value that someone might use. This value is passed to gert::git_log, which will give a confusing error on max = Inf. Consider using is.finite or an equivalent assertion.
I notice many tests are being skipped on Windows. I don't have ready access to a Windows machine. Can the author provide some confidence that this will work on Windows, or is the package intended to be Linux/Mac only? (I think we can safely disregard Solaris...)
Regarding the vignette, the "Git data backend" is a local concept as far as I can tell. There's no use of a git remote here. I think for many people, git and GitHub are entirely synonymous, so it would be good to distinguish between the local and remote.

Nitpicks

The run function from the processx package is in the NAMESPACE of gittargets. However in R/utils_git.R the author uses processx::run. Normally I wouldn't care, but run is a pretty commonly used function name, so I'd recommend keeping it out of the NAMESPACE if possible by using only processx::run.
tar_git_branch_snapshot and tar_git_commit_code refer to a "^code=" string; the meaning of this isn't entirely obvious (it's the prefix of the data branches). A comment in the code could clear this up.
While it's not usually necessary to document internal functions, I think it would be a good idea to provide some Roxygen documentation for tar_git_snapshot_menu due to the large number of arguments it accepts. Recall that internal functions can be tagged with @keywords internal to prevent the help pages from being indexed.
There are many SHA fingerprints in the diagrams which may make them harder to understand.
The function name tar_git_status_code confuses me. "Status code" makes me think of something like a numeric HTTP status code or Unix exit status, but this returns a tibble.

Session info

Click to expand!

``` ─ Session info 🤲🏾 🈁 🤯 ────────────────────────────────────────────────────────────────────────── hash: palms up together: medium-dark skin tone, Japanese “here” button, exploding head setting value version R version 4.1.0 (2021-05-18) os macOS Big Sur 11.3 system aarch64, darwin20 ui RStudio language (EN) collate en_AU.UTF-8 ctype en_AU.UTF-8 tz Australia/Melbourne date 2022-01-09 rstudio 1.4.1717 Juliet Rose (desktop) pandoc NA ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────── package * version date (UTC) lib source askpass 1.1 2019-01-13 [1] CRAN (R 4.1.0) backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.1) base64url 1.4 2018-05-14 [1] CRAN (R 4.1.0) cachem 1.0.4 2021-02-13 [1] CRAN (R 4.1.0) callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0) cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1) codetools 0.2-18 2020-11-04 [1] CRAN (R 4.1.0) crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1) credentials 1.3.2 2021-11-29 [1] CRAN (R 4.1.1) data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.1) desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.1) devtools 2.4.0 2021-04-07 [1] CRAN (R 4.1.0) digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.1) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) gert * 1.5.0 2022-01-03 [1] CRAN (R 4.1.1) gittargets * 0.0.0.9000 2022-01-08 [1] Github (wlandau/gittargets@6b3dada) glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.1) igraph 1.2.11 2022-01-04 [1] CRAN (R 4.1.1) knitr 1.37 2021-12-16 [1] CRAN (R 4.1.1) lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0) openssl 1.4.6 2021-12-19 [1] CRAN (R 4.1.1) pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.1) pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) pkgload 1.2.3 2021-10-13 [1] CRAN (R 4.1.1) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0) processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0) ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) remotes 2.3.0 2021-04-01 [1] CRAN (R 4.1.0) rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.1) rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) sessioninfo 1.2.1 2021-11-02 [1] CRAN (R 4.1.1) sys 3.4 2020-07-23 [1] CRAN (R 4.1.0) targets * 0.10.0 2022-01-07 [1] CRAN (R 4.1.0) testthat 3.1.0 2021-10-04 [1] CRAN (R 4.1.1) tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.1) tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) usethis 2.0.1 2021-02-10 [1] CRAN (R 4.1.0) utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) uuid 1.0-3 2021-11-01 [1] CRAN (R 4.1.1) vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.1) xfun 0.29 2021-12-14 [1] CRAN (R 4.1.1) yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library ```

adamhsparks commented 2 years ago

Thanks for the comprehensive review @mdneuzerling.

smwindecker commented 2 years ago

Package Review

I have used targets before, and have written twitter threads on the topic. I have communicated online with the author.
[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work.

Documentation

The package includes all the following forms of documentation:

[x] A statement of need: clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s): demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions
[x] Examples: (that run successfully locally) for all exported functions
[x] Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

[x] Installation: Installation succeeds as documented.
[x] Functionality: Any functional claims of the software been confirmed.
[x] Performance: Any performance claims of the software been confirmed.
[x] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
[x] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 7

[x] Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Thank you for creating another useful addendum package to the targets family. I look forward to seeing this tool implemented in the suggested workflows for targets pipelines more broadly.

Statement of need:

By the name I thought this package would be some sort of targets specific link to git itself (such as what gert is achieving in these examples).
When I understood the purpose of the package, I was very excited about its usefulness, as snapshotting data outputs in a version controlled way was a question I asked myself using targets in the past, and even thought about git committing my _targets/ folder at one point!
With that said, it took reading through most of the vignette and seeing the function outputs to really understand what the package was doing.
The README and DESCRIPTION could use clearer statements, for example: "gittargets allows users to create version control snapshots of the data objects created in their pipeline." ... or something simple like that.

tar_git_status()

I love this function! If I was still at all confused about how this package worked, the separate code git status and data git status made everything clear.
Small note, at the first use (before a data snapshot has been made but after the git repo was initialised) we see “! Create the code repository with gert::git_init().” under data git status header, but it shouldn’t be there as a code repo already exists.

Remotes

Can a data snapshot be pushed to github, or some other place where it can be assigned, for example a "release" and downloaded? I imagine it could be useful in the scenario of a published/submitted manuscript to be able to push a single version of the snapshot associated with the final code push. Although it is naturally preferable that a reviewer recreate the pipeline from scratch for reproducibility purposes, for many slow pipelines that is unfeasible, and having a snapshot there for folks to examine and play with intermediate outputs would be fantastic as an option.

Vignette/instructions

I found the steps easy to follow, but several things could be improved to make it easier to understand.
Checking out a data snapshot defaults to checking out the latest in a given branch (yes?). I found myself more explicitly using the name of the data snapshot from the tar_git_log when checking it out to ensure I was selecting the appropriate one. Given this, it should be more obvious to users that data snapshots will, by default, be given the same name as the commit message of the code commit they are tied to, and that you can give a name to the snapshot.
The idea of names of data snapshots is also a bit clouded by the "snapshot model" illustration, which shows the SHA instead of the commit name.
The instructions could use another step to clarify what happens in a merge. Specifically that after merging a branch, the data snapshot does not get overwritten with the branch snapshot, and if desired, should be checked out at that point. Similarly, clarifying that deleting a branch does not delete its associated data commit.
Is there a way to clean up outdated data snapshots? tar_git_remove_snapshot? Or is this unnecessary because the point is a version control record? I just imagine it could get too large very fast.
The "snapshot model" illustration shows a linear model of working with gittargets that does not match the vignette steps, which delves into branches. I think the addition of a simple image, in the style of the one below that many of us are familiar with, could attempt to illustrate the separate pathways of the code and data in a branching workflow.

Internal functions

When I write into my console tar_git_ and hit tab, it shows me a range of functions that are not included in the exported functions (eg. tar_git_commit()) I found this confusing as I searched for the correct exported function to use, especially because these non exported but visible functions could not be searched with the ? command (naturally, they're not exported!) It gave me pause that I was missing some functionalities.
I think these show up from the way the functions are documented as they sit in the same file as an exported function (I'm not at all sure this is the explanation).

Overall, my notes are quite minor and mostly related to improving user experience and uptake rather than real issues with the package. I love the idea and the execution is elegant, especially due to the function name parallels to the functions in gert (which I didn't know about before, so thank you!). I think developing a little more content about suggested best practices of how a targets user should integrate these steps into their workflow would make this contribution to the wider package family even more meaningful.

PS. the guitar in the hex is a very cute touch.

wlandau commented 2 years ago

Thank you @mdneuzerling and @smwindecker for your thoughtful reviews! I am working on responses and improvements to the package.

adamhsparks commented 2 years ago

@smwindecker, thank you for your thoughtful review.

wlandau commented 2 years ago

Response to @mdneuzerling

I feel like the first paragraph of the README doesn't adequately capture the use case of gittargets. What I'm looking to see in the first few paragraphs is something that covers what the targets data store is, and what you want to have happen to it when you switch branches with git.

I expanded the README in https://github.com/wlandau/gittargets/commit/11943f11b1f719e0f815f10e31f2290c8b6c514c to describe those assumptions.

In the project README or elsewhere, I recommend describing the intended workflow of the package. I settled on the below when checking out a different branch, which I think is right:

https://github.com/wlandau/gittargets/commit/ddc1e59f9560131e2aef971d1dfe122568762930 replaces man/figures/workflow.png with a simpler description of the workflow in the Usage section of the README and at the top of the git.Rmd vignette.

The assertions in tar_git_log allow for a value of max = Inf, which is a reasonable value that someone might use. This value is passed to gert::git_log, which will give a confusing error on max = Inf. Consider using is.finite or an equivalent assertion.

Added an assertion in https://github.com/wlandau/gittargets/commit/f12ee85401f607b582bd321283b6ecea21e95101.

I notice many tests are being skipped on Windows. I don't have ready access to a Windows machine. Can the author provide some confidence that this will work on Windows, or is the package intended to be Linux/Mac only? (I think we can safely disregard Solaris...)

When I unskip those tests on my local Windows machine, they all pass. But I have not been able to get them to work on GitHub Actions because the Windows runner claims the user name and email are not set, despite multiple attempted workarounds. As of https://github.com/wlandau/gittargets/commit/5acd4ecf648986b327d056abe4ed869aaf23b465 these tests are only skipped on CI Windows, not local Windows.

I would have preferred to use gert for all Git-related functionality because it installs well and just works on all platforms without any config. But gert relies on libgit2, which still silently fails with Git LFS unless the user makes some custom modifications: https://github.com/git-lfs/git-lfs/issues/375, https://github.com/libgit2/libgit2/issues/1473. So now gittargets requires a full Git client for operations that modify local Git repos. For reading stuff from Git, gittargets still uses gert because it gets the output into R quite nicely.

Regarding the vignette, the "Git data backend" is a local concept as far as I can tell. There's no use of a git remote here. I think for many people, git and GitHub are entirely synonymous, so it would be good to distinguish between the local and remote.

https://github.com/wlandau/gittargets/commit/bfe062009873cb20efe37700ee29d05cbbd9c610 adds a section on remotes and emphasizes that gittargets uses local Git only.

The run function from the processx package is in the NAMESPACE of gittargets. However in R/utils_git.R the author uses processx::run. Normally I wouldn't care, but run is a pretty commonly used function name, so I'd recommend keeping it out of the NAMESPACE if possible by using only processx::run.

I see what you mean, but importing process::run() in the NAMESPACE file only has consequences inside the gittargets private namespace, which does not have a conflicting internal function called "run". And it should not conflict with external code unless I re-export it. And think it is good practice to have NAMESPACE entries for DESCRIPTION Imports: packages (even with namespaced calls in the code, which I also do for the sake of defensive programming.)

tar_git_branch_snapshot and tar_git_commit_code refer to a "^code=" string; the meaning of this isn't entirely obvious (it's the prefix of the data branches). A comment in the code could clear this up.

Added comments in https://github.com/wlandau/gittargets/commit/5b7b1425fe5cb916c67dfd2283d14774f7c4b59a.

While it's not usually necessary to document internal functions, I think it would be a good idea to provide some Roxygen documentation for tar_git_snapshot_menu due to the large number of arguments it accepts. Recall that internal functions can be tagged with @keywords internal to prevent the help pages from being indexed.

Added a docstring in https://github.com/wlandau/gittargets/commit/541168f1c405dda019a4c05e0025a210f92a5327.

There are many SHA fingerprints in the diagrams which may make them harder to understand.

It may look like a lot to take in at first glance, but SHAs are important for understanding what is going on because of how the "code=" data branches match up with code commits. I kept them but simplified the figure in https://github.com/wlandau/gittargets/commit/37a20a75d7149bed889071570a308c4d56972aa9.

The function name tar_git_status_code confuses me. "Status code" makes me think of something like a numeric HTTP status code or Unix exit status, but this returns a tibble.

I get what you mean. I guess I could change it to a name like tar_git_status_code_repo(), but I am not 100% sold at the moment because it would interrupt the internal consistency of the function name convention.

wlandau commented 2 years ago

Response to @smwindecker

With that said, it took reading through most of the vignette and seeing the function outputs to really understand what the package was doing. The README and DESCRIPTION could use clearer statements, for example: "gittargets allows users to create version control snapshots of the data objects created in their pipeline." ... or something simple like that.

I rewrote the README and DESCRIPTION in https://github.com/wlandau/gittargets/commit/11943f11b1f719e0f815f10e31f2290c8b6c514c, hopefully they are clearer now.

Small note, at the first use (before a data snapshot has been made but after the git repo was initialised) we see “! Create the code repository with gert::git_init().” under data git status header, but it shouldn’t be there as a code repo already exists.

https://github.com/wlandau/gittargets/commit/ef8c0e07f87802ba3c6580b27b10377c514b7a83 improves the tar_git_status() messages.

Can a data snapshot be pushed to github, or some other place where it can be assigned, for example a "release" and downloaded? I imagine it could be useful in the scenario of a published/submitted manuscript to be able to push a single version of the snapshot associated with the final code push. Although it is naturally preferable that a reviewer recreate the pipeline from scratch for reproducibility purposes, for many slow pipelines that is unfeasible, and having a snapshot there for folks to examine and play with intermediate outputs would be fantastic as an option.

I think something like this should be possible if you create a Git-LFS-backed repo on GitHub or similar. Data snapshots are just branches, and you could either selectively push them to the remote repo, or you could push them all with git push --all; git push --tags. Then a code release could refer to a branch and commit in the remote data repo. gittargets itself tries not to go beyond local Git. But ultimately, this is the user's responsibility to manage outside gittargets.

Checking out a data snapshot defaults to checking out the latest in a given branch (yes?). I found myself more explicitly using the name of the data snapshot from the tar_git_log when checking it out to ensure I was selecting the appropriate one. Given this, it should be more obvious to users that data snapshots will, by default, be given the same name as the commit message of the code commit they are tied to, and that you can give a name to the snapshot.

As of https://github.com/wlandau/gittargets/commit/40ae23469486196628a426b7210fdf15beb8697c, the docstring of tar_git_checkout() is clearer. By default, tar_git_checkout() checks out the snapshot from the currently checked out code commit ("HEAD"), which is not necessarily the latest in the current branch (my mistake in the previous docstring). In other words, if you do a git reset --hard old_commit or a git checkout old_branch and then call tar_git_checkout(), then you could get a very old (but still correct) data snapshot without supplying the ref argument. The idea is to let the user focus on the code repo and trust tar_git_snapshot() to automatically find the right data. So ideally, the user should almost never need the ref argument. Does that make sense?

The idea of names of data snapshots is also a bit clouded by the "snapshot model" illustration, which shows the SHA instead of the commit name.

I added commit messages to the new figure in https://github.com/wlandau/gittargets/commit/37a20a75d7149bed889071570a308c4d56972aa9.

The instructions could use another step to clarify what happens in a merge. Specifically that after merging a branch, the data snapshot does not get overwritten with the branch snapshot, and if desired, should be checked out at that point. Similarly, clarifying that deleting a branch does not delete its associated data commit.

Yes, ordinary Git operations on the code repo do not change the data repo. https://github.com/wlandau/gittargets/commit/b26a7f5e64be8a728b8e0320ee94cc32f481a0dc adds a small section to the vignette on merges specifically.

Is there a way to clean up outdated data snapshots? tar_git_remove_snapshot? Or is this unnecessary because the point is a version control record? I just imagine it could get too large very fast.

I would have liked to implement this, but unfortunately it seems like an uphill battle against the intended behavior and design of Git. Because gittargets creates a large number of single-commit branches, the usual squash/rebase approach for deleting commits appears to breaks down. (I suppose git merge --squash comes close, but then that creates a new squash commit, and we would need to put that in a different data branch somehow, which seems messy.) An alternative I explored in https://github.com/wlandau/gittargets/issues/9 is to create an orphan branch for each data snapshot, but then we would lose the automatic audit trail of which targets changed between data snapshots, which I would prefer to keep for the sake of reproducibility.

In the gittargets documentation, I do recommend that snapshots only be created at important strategic milestones. This is the opposite of the usual advice for code version control (i.e. to commit early and often). The code and the data have such different needs when it comes to version control, and this fact was actually a major motivation for gittargets, which operates on entirely different Git repos for the code vs the data.

The "snapshot model" illustration shows a linear model of working with gittargets that does not match the vignette steps, which delves into branches. I think the addition of a simple image, in the style of the one below that many of us are familiar with, could attempt to illustrate the separate pathways of the code and data in a branching workflow.

There is a new figure in https://github.com/wlandau/gittargets/commit/37a20a75d7149bed889071570a308c4d56972aa9 which more closely aligns with the vignette and the style of https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell.

When I write into my console targit and hit tab, it shows me a range of functions that are not included in the exported functions (eg. tar_git_commit()) I found this confusing as I searched for the correct exported function to use, especially because these non exported but visible functions could not be searched with the ? command (naturally, they're not exported!) It gave me pause that I was missing some functionalities. I think these show up from the way the functions are documented as they sit in the same file as an exported function (I'm not at all sure this is the explanation).

This happens to me if I load gittargets with pkgload::load_all(), but not if I restart my R session and load it with library(gittargets). Does that help?

mdneuzerling commented 2 years ago

I think @wlandau's responses to my recommendations are more than adequate. In particular, the improvements to the README are superb. I don't have anything further to say about this package, other than that I look forward to seeing it on CRAN and using it in my own projects. Thank you so much for your contributions, @wlandau.

smwindecker commented 2 years ago

Thanks @wlandau for your responses.

The changes and explanations made much clearer the concept of the data snapshots as branches themselves. The relevance of this to understanding the snapshot 'ref's and how to call them is now clear. I appreciate the new figure and especially the accompanying explanation. And yes I had a local copy of the whole package and that's why I was seeing all the functions!

I am satisfied my review has been fully addressed. Look forward to sharing this with others.

adamhsparks commented 2 years ago

@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/486#issuecomment-1008248927 time 5

ropensci-review-bot commented 2 years ago

Logged review for mdneuzerling (hours: 5)

adamhsparks commented 2 years ago

@ropensci-review-bot submit review time 7

ropensci-review-bot commented 2 years ago

Logged review for smwindecker (hours: 7)

adamhsparks commented 2 years ago

@ropensci-review-bot approve gittargets

ropensci-review-bot commented 2 years ago

Approved! Thanks @wlandau for submitting and @smwindecker, @mdneuzerling for your reviews! :grin:

To-dos:

[ ] Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so.
[ ] After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
[ ] Fix all links to the GitHub repo to point to the repo under the ropensci organization.
[ ] Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
[ ] If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
- deactivate the automatic deployment you might have set up
- remove styling tweaks from your pkgdown config but keep that config file
- replace the whole current pkgdown website with a redirecting page
- replace your package docs URL with https://docs.ropensci.org/package_name
- In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
[ ] Fix any links in badges for CI and coverage to point to the new repository URL.
[ ] Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
[ ] We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
[ ] You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

adamhsparks commented 2 years ago

@mdneuzerling, I noted after reading your approval that you still hadn't ticked the "Statement of need" in your checklist. I'm assuming that this has been met since you've approved the package revisions in your most recent post regarding updates to the README. If this is correct, could you please just tic that box to complete the review for us? 🙏

mdneuzerling commented 2 years ago

@adamhsparks Sorry about that! Done.

ropensci / software-review

gittargets: data version control for the targets package #486

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

Checks for gittargets (v0.0.0.9000)

1. Statistical Properties

1a. Network visualisation

2. `goodpractice` and other checks

Editor-in-Chief Instructions:

Checks for gittargets (v0.0.0.9000)

1. Statistical Properties

1a. Network visualisation

2. `goodpractice` and other checks

Editor-in-Chief Instructions:

Editor checks:

[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Package Review

Documentation

Functionality

Review Comments

General comments

Recommendations

Nitpicks

Session info

Package Review

Documentation

Functionality

Review Comments

Response to @mdneuzerling

Response to @smwindecker

ropensci / software-review

gittargets: data version control for the targets package #486

Archive: TBD Version accepted: TBD Language: en

Scope

Technical checks

Publication options

Code of conduct

Checks for gittargets (v0.0.0.9000)

1. Statistical Properties

1a. Network visualisation

2. goodpractice and other checks

Editor-in-Chief Instructions:

Checks for gittargets (v0.0.0.9000)

1. Statistical Properties

1a. Network visualisation

2. goodpractice and other checks

Editor-in-Chief Instructions:

Editor checks:

[x] Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

Package Review

Documentation

Functionality

Review Comments

General comments

Recommendations

Nitpicks

Session info

Package Review

Documentation

Functionality

Review Comments

Response to @mdneuzerling

Response to @smwindecker

2. `goodpractice` and other checks

2. `goodpractice` and other checks