openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
694 stars 36 forks source link

[REVIEW]: jtools: Analysis and Presentation of Social Scientific Data #6610

Open editorialbot opened 2 months ago

editorialbot commented 2 months ago

Submitting author: !--author-handle-->@jacob-long<!--end-author-handle-- (Jacob A. Long) Repository: https://github.com/jacob-long/jtools Branch with paper.md (empty if default branch): joss Version: 2.2.2 Editor: !--editor-->@samhforbes<!--end-editor-- Reviewers: @yjunechoe, @misken Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/d473276e1ace9dd5fef8f01a975ace25"><img src="https://joss.theoj.org/papers/d473276e1ace9dd5fef8f01a975ace25/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/d473276e1ace9dd5fef8f01a975ace25/status.svg)](https://joss.theoj.org/papers/d473276e1ace9dd5fef8f01a975ace25)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@yjunechoe & @misken, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @samhforbes know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @misken

📝 Checklist for @yjunechoe

editorialbot commented 2 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 2 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.11 s (1228.1 files/s, 444805.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSV                              2              0              0          25050
R                               37           1872           3943           7086
SVG                             73              0              0           6680
Markdown                         9            474              0           1989
YAML                             6             50             18            292
TeX                              1             16              0            159
Rmd                              4            280            766            156
JSON                             2              0              0             38
CSS                              1              6              0             22
-------------------------------------------------------------------------------
SUM:                           135           2698           4727          41472
-------------------------------------------------------------------------------

Commit count by author:

  1020  Jacob Long
     7  jacob-long
     6  Jacob
     2  Duncan Murdoch
     2  Noah Greifer
     1  Alan O'Callaghan
     1  NickCH-K
editorialbot commented 2 months ago

Paper file info:

📄 Wordcount for paper.md is 784

✅ The paper includes a Statement of need section

editorialbot commented 2 months ago

License info:

🟡 License found: GNU General Public License v3.0 (Check here for OSI approval)

editorialbot commented 2 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.18637/jss.v103.i01 is OK
- 10.21105/joss.03393 is OK
- 10.18637/jss.v095.i01 is OK
- 10.1093/geronb/gbac154 is OK
- 10.3758/s13415-021-00942-5 is OK
- 10.1177/09636625231154131 is OK
- 10.2307/2288115 is OK
- 10.1016/S2213-2600(21)00536-1 is OK
- 10.1017/S1537592707072209 is OK

MISSING DOIs

- No DOI given, and none found for title: sjPlot: Data Visualization for Statistics in Socia...
- No DOI given, and none found for title: An R Companion to Applied Regression
- No DOI given, and none found for title: A Kenward-Roger Approximation and Parametric Boots...
- No DOI given, and none found for title: marginaleffects: Predictions, Comparisons, Slopes,...
- No DOI given, and none found for title: Analysis of Complex Survey Samples
- No DOI given, and none found for title: Parametric and Semi-Parametric Estimation of Regre...

INVALID DOIs

- None
editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

misken commented 2 months ago

Review checklist for @misken

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

yjunechoe commented 2 months ago

Review checklist for @yjunechoe

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

yjunechoe commented 2 months ago

I've been test-driving {jtools} for a while since I agreed to the review, so here's an early review. Overall, this is a well-designed, feature-rich package, and it was a pleasure to review. It offers a substantial contribution to data analysis and statistical modelling for research in the social (and other) sciences.

Checklist comments

Automated tests

I've tested most examples and tests and they run fine on my local machine, but I see some GHA workflows failing on the package repo - please give the failing actions another run and resolve any issues if they persist.

Paper: statement of need/ state of the field

Since the package touches various areas of statistical and data analysis, the package's features overlap with many packages that are more specific in nature. As mentioned in the paper, these include packages like {marginaleffects}, {modelsummary}, {sjPlot}, etc. - I would also like to add to this list {ggeffects} and {gtsummary} (in so far as {jtools} provide reporting capability), and other members of the {easystats} ecosystem (such as {performance} and {parameters}).

I think it'd be helpful for the paper to not only mention the similarities, but also highlight the advantages and disadvantages to using {jtools} as opposed to using a combination of specialized packages. For example, the claim that "... {jtools} aims to improve the user interface and support more model types in some cases" could use some specific examples. Ideally, something like a table of comparisons (similar packages as columns, features as rows) would be extremely helpful for new users deciding whether {jtools} is right for them.

Comments on software

Implicit dependencies

The package has several "soft" dependencies - these are listed in Suggests and only loaded when needed, as they should be. However, it seems that these soft dependencies are causing some confusion for users who do not have them installed (as in issues 132 and 107, as well as the issue that the other reviewer raised above). One thing that could help here is to use rlang::check_installed(), which would not only check a package's availability but also prompt the user to install the missing package to proceed. This could replace/standardize the various in-house checking logics like the below:

  if (!requireNamespace("lme4", quietly = TRUE)) {
    stop("Package `lme4` needed for this function to work. Please install it.",
         call. = FALSE)
  }

Additionally, in at least one case, a package in Suggests is versioned but there is no version compatibility check at the point when it's loaded (huxtable >= 3.0.0). Dependencies listed in Suggests have to be manually checked for their version on load (this is again easy to do with rlang::check_installed()).

Lastly, speaking of dependencies and namespaces, it looks like this piece of code in plot_coefs.R is waiting for a refactoring, now that {broom} has hit v1.0:

    # Major kludge for methods clash between broom and broom.mixed
    # Making namespace environment with broom.mixed before generics 
    # to try to put those methods in the search path
    # Will drop after update to broom 0.7.0
    if (requireNamespace("broom.mixed")) {
      nse <- as.environment(unlist(sapply(c(asNamespace("broom.mixed"), 
                                            asNamespace("generics")),
                                          as.list)))
    } else {
      nse <- asNamespace("generics")
    }

A tension in dependency versioning

This package has a long history (I count ~8years since the first commit), and I really applaud the effort to preserve some of the older behaviors and features for maximal reproducibility. However I also see some tension between the desire to integrate cutting-edge tools and the desire to maintain backwards compatibility for users on older R / R package versions.

This is most apparent in the versioning of packages in Imports. It lists ggplot2 (>= 3.4.0) (younger, released 2022) and rlang (>= 0.3.0) (older, released 2018). Typically, this kind of a division is appropriate - low-level, "back-end" dependencies should be as stable as possible, but that constraint should be relaxed for "front-end" dependencies, as updates to those packages usually bring about user-visible improvements. However, this overlooks the fact that ggplot2 (>=3.4.0) itself depends on the more recent rlang (>=1.0.0) - this ends up circumventing a lot of backwards-compatibility measures carefully implemented in the package and maintained over the course of the package's many years of development. For example, this if block is dead code because the condition will never trigger:

  if (packageVersion("rlang") < "0.3.2") {
    options("rlang__backtrace_on_error" = "branch")
  } else {

I strongly urge the author to revisit these backwards compatibility workarounds scattered across the source code and, where appropriate, remove such dead code. If possible, I further suggest that the author takes the JOSS review process as an opportunity to push an update to the package bringing it more up to date with the current state of the tidyverse, in so far as {jtools} also uses their low-level infrastructure (ex: {rlang} bumped to the "production" release v1.0, {cli} used in place of {crayon}, etc.), and unapologetically drop support for some older behaviors that are not critical to the correctness of output.

Outdated non-standard evaluation patterns

This is a minor comment, but inside aes(), !!sym("col") can now be .data$col and !!col can now be .data[[col]]. This brings {ggplot2} NSE patterns up to date with the most recent changes to tidy-evaluation semantics in {rlang}, namely, the introduction of the .data and .env pronouns.

I think there are some other cases of NSE that could be simplified/streamlined - I suggest double-checking whether these can be updated in accordance with rlang v1.0 (again, I strongly encourage this version bump)


Misc.

I'll add more minor comments here (or as issues on the pkg repo) as they come up, but this summarizes most of my thoughts.

misken commented 2 months ago

As someone who doesn't do social science research, I agree with @yjunechoe on the benefit of including some discussion in the paper of related (in verious specific ways) packages like ggeffects, gtsummary, performance and parameters. Looking through the main web pages for these packages I see some overlap with jtools. As a newcomer to doing social science regression modeling, it would be nice to know more about where jtools fits in to help with the decision about which packages to use.

I posted a few issues in the jtools repo relating to the library imports as well as some missing output in one of the vignettes. I will defer to @yjunechoe on the automated tests. I did get all of the example code to run but did not run the automated tests locally.

Not sure I have any more to add at this point. It should be relatively easy to address these issues and move the paper forward.

yjunechoe commented 2 months ago

On the note of tests, I can confirm the correctness of tests and they have good coverage, though they pass/fail to varying degrees due to the technicalities of how tests are implemented. Namely, it seems that the tests for plotting features (in test-effect-plots.R and test-export-summs.R) use snapshot tests on exactness of the graphical output (svg). I'll just note that this is extremely brittle to changes in the internals of upstream packages, namely {grid} and {ggplot2}. For example, all tests on plots would fail if {ggplot2} someday decides to draw axes right-to-left vs. left-to-right. Indeed, all plots currently fail the snapshot tests for such trivial reasons (so please update the snaps). There are of course pros and cons to this setup - it makes it trivial to write tests but at the cost of maintenance (as you're already aware, snaps require manual updates every time they get outdated). In case test longevity is a concern, I just wanted to point to another possibility for testing plots, which is to test specific properties read off of the ggplot2 object (e.g., via layer_data()). If you must use snapshot tests, I still recommend implementing some ways to avoid false positives, such as testing on theme_void() versions of the plots and writing to raster files (ex: png).

Echoing @misken's comment, I think this also wraps up my thoughts for now (I'd also like to clarify that many of my recommendations are not acceptance-blockers). I hope that the suggestions given so far point to clear targets that are straightforward to address.

samhforbes commented 1 month ago

Hi @jacob-long I just wanted to check in and see if you had any thoughts regarding the reviews posted above. Generally I'd echo the reviewers' comments above - having tried and tested it extensively it offers a lot to social science researchers. It looks as though there are a few minor issues / improvements being suggested here and it wouldn't hurt to resolve these. The rlang and NSE comments made above are in particular helpful I think.

Otherwise from my point of view I'd agree adding a bit to the paper to clarify the unique contribution - and to me it does make one - of the package would be helpful for readers.

@misken and @yjunechoe I can see you've otherwise ticked off your reviewer checklists - is there anything major I've missed?

jacob-long commented 1 month ago

Yes I agree that the reviewers have left very helpful feedback and it appears there should be no major problems on my end addressing them. I will be able to get to it soon. Procedurally, where do we go from here? I'll make some changes in my repo, both to the package itself and the paper. I of course may seek clarifications/input from the reviewers either here or in issues made in my repo. Once I feel satisfied that I've implemented their suggestions, should I post something like a "response to reviewers letter" here?

samhforbes commented 2 weeks ago

Hi @jacob-long sorry I totally missed this last question. If you are satisfied that you've addressed questions and concerns you can say so here (or in issues directly where relevant) and you can tag in the relevant reviewer where appropriate. From what I can see above none of these are "blocking" issues for the reviewers, so it's a question of addressing these in a way you feel improved your package!

samhforbes commented 3 days ago

Hi @jacob-long I thought I'd just check in and see how this was going?