openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: specieshindex: How scientifically popular is a species? #3776

Closed whedon closed 2 years ago

whedon commented 3 years ago

Submitting author: !--author-handle-->@jesstytam<!--end-author-handle-- (Jessica Tam) Repository: https://github.com/jessicatytam/specieshindex Branch with paper.md (empty if default branch): Version: 0.3.1 Editor: !--editor-->@Bisaloo<!--end-editor-- Reviewers: @kamapu, @tom-gu, @DrMattG Archive: Pending

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/401f3e0de4da2e338c5903d8a96a910b"><img src="https://joss.theoj.org/papers/401f3e0de4da2e338c5903d8a96a910b/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/401f3e0de4da2e338c5903d8a96a910b/status.svg)](https://joss.theoj.org/papers/401f3e0de4da2e338c5903d8a96a910b)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@kamapu & @tom-gu & @DrMattG, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @Bisaloo know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @kamapu

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @tom-gu

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @DrMattG

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @kamapu, @tom-gu, @DrMattG it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago

Wordcount for paper.md is 1478

whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.05 s (357.9 files/s, 141975.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Rmd                              2            271            520           2140
R                                9            175           1233           1355
TeX                              3             31              0            522
Markdown                         2            100              0            312
YAML                             1             15              1             68
-------------------------------------------------------------------------------
SUM:                            17            592           1754           4397
-------------------------------------------------------------------------------

Statistical information for the repository 'bd838b7115269e827ea9c7cb' was
gathered on 2021/09/29.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
jessicatytam                     2            19             19          100.00

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1073/pnas.0507655102 is OK
- 10.1111/tbed.12221 is OK
- 10.1111/mam.12066 is OK
- 10.1007/s11192-015-1798-9 is OK
- 10.1007/s11192-007-1859-9 is OK
- 10.1371/journal.pone.0131004 is OK
- 10.1111/mam.12038 is OK
- 10.1038/s41598-017-09084-6 is OK
- 10.1139/facets-2016-0011 is OK
- 10.1007/s11192-006-0147-4 is OK
- 10.1017/S1367943004001799 is OK
- 10.1371/journal.pone.0189577 is OK
- 10.1007/s11192-006-0146-5 is OK
- 10.1016/S0169-5347(01)02381-3 is OK
- 10.1038/s41467-018-07916-1 is OK
- 10.1111/acv.12586 is OK
- 10.1038/s41477-021-00912-2 is OK
- 10.1016/j.gecco.2018.e00423 is OK
- 10.1007/s11160-019-09556-0 is OK
- 10.1111/j.1523-1739.2006.00616.x is OK
- 10.1111/j.1523-1739.2010.01453.x is OK
- 10.1108/07378830610715473 is OK
- 10.1093/eurheartj/ehx446 is OK
- 10.1108/14684521211209581 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Bisaloo commented 3 years ago

:wave::wave::wave::wave: @jessicatytam @kamapu @tom-gu @DrMattG, this is the review thread for the paper. All of our communications will happen here from now on.

All reviewers have checklists at the top of this thread with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#3776 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use Whedon (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@Bisaloo) if you have any questions/concerns.

whedon commented 3 years ago

:wave: @tom-gu, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @DrMattG, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @kamapu, please update us on how your review is going (this is an automated reminder).

Bisaloo commented 3 years ago

Thanks for your comments, @DrMattG!

:wave: @kamapu, @tom-gu, could you please give us an update on your progress for the review of this package & paper? Please let me know if you encounter any issues.

tom-gu commented 3 years ago

Thanks for your comments, @DrMattG!

👋 @kamapu, @tom-gu, could you please give us an update on your progress for the review of this package & paper? Please let me know if you encounter any issues.

👋 @Bisaloo, I'll submit my review by tomorrow. Sorry for the delay, it's due to the occurrence of too many unexpected events :sweat_smile:

kamapu commented 3 years ago

Shame on me! Though too late, I'll provide my comments. Unfortunately I don't have any access or copy to the manuscript now.

Bisaloo commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Bisaloo commented 3 years ago

@kamapu, you should be able to access the manuscript via the links in the previous comment :slightly_smiling_face:

kamapu commented 3 years ago

@kamapu, you should be able to access the manuscript via the links in the previous comment slightly_smiling_face

Thanks! The robot is closing my issues in the repo of specieshindex. I'm doing something wrong or it is because of been too late.

danielskatz commented 3 years ago

You need to manually open issues in https://github.com/jessicatytam/specieshindex - do not use the GitHub "convert to issue" feature, as it opens issues in the wrong repository.

tom-gu commented 3 years ago

@jessicatytam @wcornwell @Bisaloo Here is my full review, so sorry for the delay!


Package overview:

The specieshindex package is showing great potential to promote scientific merit. It is a delightfully user-friendly package with superb documentation. Nonetheless, I identified several areas that can be improved. I did not open specific GitHub issues since they are more systemic and should be addressed as the package matures. Additionally, I had a few minor suggestions and some ideas for future development.

Package structure & development:

Testing & CI

I saw that you recently added several tests and a CI (via GithubAction). It is great to know that you are adopting a more test-driven development (TDD) schema. Pivotal tests for this particular package are smoke tests to monitor API connection and to validate a proper output structure. As you know by now, constructing this for Scopus is easier than for WOS and BASE. However, please refer to this section to learn how to manage secrets (e.g., API keys) within the package.

Code duplication

The level of duplication (code & function documentation) is off the charts. It partially stems from a segregated function strategy and from not fully utilizing sub-functions (internal, non-exported functions). This is problematic from the POV of a package developer since even the smallest changes can become a real hassle. When the small things become exhausting, the big things become impossible! I’m advocating for the consolidation of similar functions by adding a parameter or two. This might slightly affect the user’s experience, but it will greatly serve package maintenance and improve its overall structure. If you decide to make the changes, please avoid putting all the functions in the same R file.

Code refactoring (mostly coding efficiency)

Going forward, I encourage you to review your coding efficiency. Few examples:

Defensive programming

Communicating with our users via errors, warnings, and messages significantly improves their user experience and can help mitigate knowledge gaps. Currently, the package has no implementation of defensive programming. I encourage you to embrace it. Here you can find helpful information. Side note: implementing defensive programming is much harder when you have high levels of code duplication.

Motivation for package maintenance

Package maintenance is crucial for establishing software sustainability. This aspect of package development is a demanding God. Adequately quantifying long-term maintenance is rather challenging, even more so for scientists. When trying to connect the dots, it seems that this package was mainly developed for the study of biases in mammalian. If this is the case, I wonder if the Cornwell Lab has a long-term maintenance plan with sufficient resources to support it. Please forgive me if the query seems intrusive, I’m only trying to share insights from my own experience and pain. It might be beneficial to instigate a collaboration with an organization or additional PI that is going to heavily rely on this package. Perhaps @mjwestgate will have specific leads.

Data science >> Software development

I can classify this package under ‘data scientists developing a package’ scenario. It’s clear that you completely understand your user, but do not yet fully comprehend your development needs. In order to support your package development journey, I urge you to read rOpenSci Packaging Guide and R Packages. In a way, now is the perfect time for that. Now, you can properly digest these comprehensive resources.

goodpractice output

See: https://github.com/MangoTheCat/goodpractice Please run goodpractice::gp() or see this RStudio Project for the goodpractice output.

Minor suggestions:

Installation instructions (README; paper; vignette)

Why you need all of this: https://github.com/jessicatytam/specieshindex/blob/9698952bf81d8d549b7c5bef3a4a557f342bf13d/README.rmd#L22:L31 It’s somewhat redundant, why not just:

install.packages("remotes")
remotes::install_github("jessicatytam/specieshindex")

paper.md file

Code is too long in lines 76, 109, 144-147, and 150. https://raw.githubusercontent.com/openjournals/joss-papers/joss.03776/joss.03776/10.21105.joss.03776.pdf

DESCRIPTION file

Package contributors might be missing (compared to paper authors).

git

I see that both Jessica and Will are committing directly to the master branch. After pkg release, it’s better to have a dev branch and to merge code via PR.

pkgdown

The overall level of package documentation is excellent. Creating a website via pkgdown will beautifully showcase it. Implementation is quite easy and totally worth it!

Ideas for future functionality:

Final words

Sharing as many insights as I can is not just my duty but also a privilege, as it makes my own hardships along the way even more worthwhile. I hope you will find this review helpful and not too preachy 😃

From one Biodiversity Jedi to another, may the foRce be with you! Tomer

kamapu commented 2 years ago

Review by Miguel

Sorry for the late response. It is my first review of a package and the way to proceed has been challenging for me. Also some "spontaneous" activities were kipping me busy and I just managed to finish my review.

While this package seems to be promising for its use also in the context of my own work, I see the necessity of a major revision of both, the structure of functions as well as in the documentation.

I didn't opened issues for this review. I would suggest to the authors when responding to refer every change to a commit in the repository.

I sum up my suggestions here (detailed comments below):

Comments and Suggestions

README

Vignette

Manual

Manuscript

General comments

Checks

jesstytam commented 2 years ago

Thanks everyone for your helpful comments! I've since updated the documentation, reorganised the functions, and fixed some of the warnings that were showing up in checks. We're currently working on adding more tests for the API.

In regards to @kamapu 's comments:

Bisaloo commented 2 years ago

Great, @jessicatytam, please let us know when you have made progress on various points raised in the review. Feel free to open an issue for each item if helpful :relaxed:

@kamapu, I think you missed this question :wink::

when you say obj <- Fetch() and obj <- Count() are not working, do you mean they are returning some sort of error? They are both working for me, do you want to provide a reprex?

Bisaloo commented 2 years ago

@jessicatytam, do you have a timeline on when you think you could do the required changes? If you don't think this will be possible anytime soon, we can pause this submission or offer to resubmit later.

jesstytam commented 2 years ago

Hi @Bisaloo apologies for the holdup! I've been working on multiple projects and getting organised as I'm starting my PhD. We hope to finishing addressing the comments by the end of March. Here's a list of tasks we will be working on:

Bisaloo commented 2 years ago

Hi @jessicatytam :wave:, could you update us on the current status of this submission please? If you need some more time, it's okay but would you have a rough estimate of when we could expect your changes? If this is going to take a while, you could also withdraw this submission for the time being and resubmit later.

We are flexible. Please let me know what is the most reasonable and convenient option for you.

jesstytam commented 2 years ago

We have completed all of the tasks and addressed the comments, with the exception of adding tests, it's currently ~50% so we will probably need until the end of May if that's ok.

What would be the ideal % for test coverage?

Bisaloo commented 2 years ago

We have completed all of the tasks and addressed the comments, with the exception of adding tests, it's currently ~50% so we will probably need until the end of May if that's ok.

Sounds good :+1:

What would be the ideal % for test coverage?

There is no hard and fast rule, especially for API packages that are somewhat more difficult to write tests for. Most best practice guides say something like "major functionality is covered by tests". In my experience, around 80-85% is usually a good number to ensure tests cover most sections of your code. Higher is also great but please let me know if you have issues reaching this number.

For what it's worth, a good and lower-effort way to improve your test suite in the future is also to turn each bug (such as the ones discovered in this review) into a test so you can make sure you don't break this again in the future.

jesstytam commented 2 years ago

We've got through adding lots of tests and almost everything is now tested except for one big issue.

We're dealing with some issues with testing the functions that query an API multiple times. Since there are limitations in the scopus API, we need >10 GET requests in a single function to retrieve all of the data while handling edge cases smoothly. We are using httptest::with_mock_api() to test the API functions currently, but with_mock_api() can only test 1 GET request per test. So to test everything in those functions is difficult given the combination of the constraints of the Scopus API and the mock_api framework. So we're a bit stuck on that unless you can see a clever solution?

Here's the link the codecov: https://app.codecov.io/gh/jessicatytam/specieshindex/blob/master/R/specieshindex.R . Check out the function FetchT_scopus which is being tested but the test exits (successfully) after the first GET call. Any help much appreciated.

Bisaloo commented 2 years ago

@jessicatytam, while investigating your problem, I had another look and I want to already now mention that @DrMattG's point about code duplication is not yet properly addressed in my opinion.

Most functions still include a lot of duplicated code. I have submitted a pull request (https://github.com/jessicatytam/specieshindex/pull/60) where I show how you could reduce code duplication. Please have a look and try to apply this kind of fixes throughout the code. Let me know if it's not clear.

On the topic of duplication, I notice that the documentation of example datasets is duplicated as well. You documented them in each individual files (Koala.R, Platypus.R, etc.) and at the end of the main specieshindex.R file. Could you have a look please?

I would like to emphasize that this insistence on avoiding code duplication is not just to annoy you. Code with high duplication is very hard / impossible to maintain in the long run. If you refactor the code to reduce duplication, future updates will require change in a single place, instead of throughout the codebase.

jesstytam commented 2 years ago

Thanks for the comments! I'll have another look at the updated code again and come up with a list of things to do and a timeline.

jesstytam commented 2 years ago
Bisaloo commented 2 years ago

Hi @jesstytam, any idea about when we could complete this submission? :slightly_smiling_face:

jesstytam commented 2 years ago

Is end of Oct ok?

Bisaloo commented 2 years ago

Is end of Oct ok?

In this case, I think it would be much better to withdraw until the required changes have been made and resubmit later, mentioning this issue.

This is much better for us as it would for example allow me to handle other papers in the mean time (I limit myself to 3-4 papers at once).

jesstytam commented 2 years ago

Ah ok then, withdrawing for now sounds like the better option. Thanks!

Bisaloo commented 2 years ago

@openjournals/joss-eics, can we withdraw this submission please?

kamapu commented 2 years ago

It is OK with me.

danielskatz commented 2 years ago

@editorialbot withdraw

editorialbot commented 2 years ago

Paper withdrawn.