openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
720 stars 38 forks source link

[REVIEW]: cuallee: A Python package for data quality checks across multiple DataFrame APIs #6684

Closed editorialbot closed 4 months ago

editorialbot commented 6 months ago

Submitting author: !--author-handle-->@canimus<!--end-author-handle-- (Herminio Vazquez) Repository: https://github.com/canimus/cuallee Branch with paper.md (empty if default branch): main Version: v0.11.0 Editor: !--editor-->@jbytecode<!--end-editor-- Reviewers: @devarops, @FlorianK13 Archive: 10.5281/zenodo.12206787

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/db01d4f5a02a319fe2b4c49f68e3f859"><img src="https://joss.theoj.org/papers/db01d4f5a02a319fe2b4c49f68e3f859/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/db01d4f5a02a319fe2b4c49f68e3f859/status.svg)](https://joss.theoj.org/papers/db01d4f5a02a319fe2b4c49f68e3f859)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@devarops & @FlorianK13, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jbytecode know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @devarops

πŸ“ Checklist for @FlorianK13

editorialbot commented 6 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 6 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3603707 is OK
- 10.1145/3603706 is OK
- 10.1145/3580305.3599776 is OK
- 10.14778/3229863.3229867 is OK
- 10.1145/3529190.3529222 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 6 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.18 s (1879.6 files/s, 143082.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                         297           3177            824          14045
Markdown                        17           1594              0           4374
XML                              1              0              0            635
SVG                              5              5              2            213
YAML                             3              6             14            122
TOML                             1              5              1             72
TeX                              1              6              0             71
make                             1             11              0             42
Dockerfile                       2             11              2             33
INI                              1              0              0              5
Bourne Shell                     3              0              0              3
-------------------------------------------------------------------------------
SUM:                           332           4815            843          19615
-------------------------------------------------------------------------------

Commit count by author:

   356  Herminio Vazquez
    66  Virginie Grosboillot
    32  dependabot[bot]
     2  Demetrius Albuquerque
     1  Corey Runkel
     1  Daniel Saad
     1  Ryan Julyan
     1  Yuki
     1  dCodeYL
editorialbot commented 6 months ago

Paper file info:

πŸ“„ Wordcount for paper.md is 1241

βœ… The paper includes a Statement of need section

editorialbot commented 6 months ago

License info:

βœ… License found: Apache License 2.0 (Valid open source OSI approved license)

editorialbot commented 6 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 6 months ago

@devarops, @FlorianK13 - Dear reviewers, you can start with creating your task lists. In that list, there are several tasks.

Whenever you perform a task, you can check on the corresponding checkbox. Since the review process of JOSS is interactive, you can always interact with the author, the other reviewers, and the editor during the process. You can open issues and pull requests in the target repo. Please mention the url of this page in there so we can keep tracking what is going on out of our world.

Please create your tasklist by typing

@editorialbot generate my checklist

Thank you in advance.

devarops commented 6 months ago

Review checklist for @devarops

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper


Notes

Installation

pip install cuallee
Defaulting to user installation because normal site-packages is not writeable
Collecting cuallee
  Downloading cuallee-0.10.0-py3-none-any.whl (51 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.0/51.0 KB 1.8 MB/s eta 0:00:00
Collecting toolz>=0.12.0
  Downloading toolz-0.12.1-py3-none-any.whl (56 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.1/56.1 KB 2.7 MB/s eta 0:00:00
Collecting requests>=2.28
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.6/62.6 KB 3.8 MB/s eta 0:00:00
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests>=2.28->cuallee) (1.26.5)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests>=2.28->cuallee) (2020.6.20)
Collecting charset-normalizer<4,>=2
  Downloading charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 142.1/142.1 KB 6.0 MB/s eta 0:00:00
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests>=2.28->cuallee) (3.3)
Installing collected packages: toolz, charset-normalizer, requests, cuallee
  WARNING: The script normalizer is installed in '/home/evaro/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed charset-normalizer-3.3.2 cuallee-0.10.0 requests-2.31.0 toolz-0.12.1
FlorianK13 commented 5 months ago

Review checklist for @FlorianK13

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

jbytecode commented 5 months ago

@devarops, @FlorianK13 - I have been watching the issues created in the target repo. I want to thank you - our reviewers - for such a plain and clear review. And thank you @canimus for considering our reviewers' suggestions. Please keep on touch.

FlorianK13 commented 5 months ago

@editorialbot generate pdf

editorialbot commented 5 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

FlorianK13 commented 4 months ago

@jbytecode From my side the review is finished. The package cuallee from @canimus offers a large variety of test cases for different data frameworks in python. The implementation of the software, the software tests, and the documentation page all look very good. A clear recommendation from my side to publish the paper in JOSS.

jbytecode commented 4 months ago

@FlorianK13 - Thank you for your great effort in reviewing!

devarops commented 4 months ago

@jbytecode I will finish my review this week.

jbytecode commented 4 months ago

@devarops - Thank you, we look forward to hearing from you.

devarops commented 4 months ago

@jbytecode: I am pleased to recommend the article "cuallee: A python package for data quality checks across multiple DataFrame APIs" for publication in JOSS. The software is original, high-quality, and represents a substantial scholarly effort. This article and the software will significantly interest the readers of JOSS, and I recommend it for publication.

jbytecode commented 4 months ago

@editorialbot check references

editorialbot commented 4 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3603707 is OK
- 10.1145/3603706 is OK
- 10.1145/3580305.3599776 is OK
- 10.14778/3229863.3229867 is OK
- 10.1145/3529190.3529222 is OK
- 10.1145/2723372.2742797 is OK
- 10.1145/2872427.2883029 is OK

MISSING DOIs

- No DOI given, and none found for title: Technology Trends for 2023
- No DOI given, and none found for title: cuallee: Performance Tests
- No DOI given, and none found for title: TLC Trip Record Data
- 10.5040/9781350207318.00000004 may be a valid DOI for title: Great Expectations
- No DOI given, and none found for title: Soda Core

INVALID DOIs

- 10.3389/fdata.2020.564115 is INVALID
jbytecode commented 4 months ago

@editorialbot generate pdf

editorialbot commented 4 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 4 months ago

Note:

The suggested DOI

is not the correct one.

jbytecode commented 4 months ago

Post-Review Checklist for Editor and Authors

Additional Author Tasks After Review is Complete

Editor Tasks Prior to Acceptance

jbytecode commented 4 months ago

@canimus - I've sent a PR. This PR includes minor changes and fixes in both the bibtex and the manuscript. Please review them. If you are agree with them, please merge.

jbytecode commented 4 months ago

@canimus - Meanwhile we are proof-reading the submission, you can go on with the archiving stuff. Please create a tagged release in the software repository. Using this tagged release, create a Zenodo archive. In this archive, the author list, their ORCIDs, the title of the manuscript should match. After creating the archive, please report here the tagged release, e.g., v1.2.3, and the DOI (and URL) of the Zenodo archive. Thank you in advance.

canimus commented 4 months ago

@jbytecode Version: v0.11.0 URL: https://zenodo.org/records/12206787 DOI: 10.5281/zenodo.12206787

jbytecode commented 4 months ago

@editorialbot set v0.11.0 as version

editorialbot commented 4 months ago

Done! version is now v0.11.0

jbytecode commented 4 months ago

@canimus - The metadata (the title, the list of authors, ORCIDs of authors, etc.) of the Zenodo archive and those of the submission should match. Could you please fix it and ping me again? Thank you in advance.

canimus commented 4 months ago

@jbytecode done, added the same title and the authors orcids, and change the license to Apache License 2.0 as per the original submission and repo.

jbytecode commented 4 months ago

@editorialbot set 10.5281/zenodo.12206787 as archive

editorialbot commented 4 months ago

Done! archive is now 10.5281/zenodo.12206787

jbytecode commented 4 months ago

@editorialbot check references

editorialbot commented 4 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3603707 is OK
- 10.1145/3603706 is OK
- 10.1145/3580305.3599776 is OK
- 10.14778/3229863.3229867 is OK
- 10.1145/3529190.3529222 is OK
- 10.1145/2723372.2742797 is OK
- 10.1145/2872427.2883029 is OK

MISSING DOIs

- No DOI given, and none found for title: Technology Trends for 2023
- No DOI given, and none found for title: cuallee: Performance Tests
- No DOI given, and none found for title: TLC Trip Record Data
- 10.5040/9781350207318.00000004 may be a valid DOI for title: Great Expectations
- No DOI given, and none found for title: Soda Core

INVALID DOIs

- 10.3389/fdata.2020.564115 is INVALID
jbytecode commented 4 months ago

@canimus - The DOI 10.3389/fdata.2020.564115 is not resolved. Could you please fix it? Could you please also check the missing DOIs and if it is possible to find the DOIs, please add them. Thank you in advance.

canimus commented 4 months ago

Hi @jbytecode I decided to remove the DOI for that reference. Even when is published here: https://www.frontiersin.org/articles/10.3389/fdata.2022.945720/full Searching for that doi in the https://doi.org directly throws an error.

canimus commented 4 months ago

@editorialbot check references

editorialbot commented 4 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3603707 is OK
- 10.1145/3603706 is OK
- 10.1145/3580305.3599776 is OK
- 10.14778/3229863.3229867 is OK
- 10.1145/3529190.3529222 is OK
- 10.1145/2723372.2742797 is OK
- 10.1145/2872427.2883029 is OK

MISSING DOIs

- No DOI given, and none found for title: Technology Trends for 2023
- No DOI given, and none found for title: cuallee: Performance Tests
- No DOI given, and none found for title: TLC Trip Record Data
- 10.5040/9781350207318.00000004 may be a valid DOI for title: Great Expectations
- No DOI given, and none found for title: Soda Core

INVALID DOIs

- None
jbytecode commented 4 months ago

It seems 10.3389/fdata.2022.945720 is a valid DOI for the submission https://www.frontiersin.org/articles/10.3389/fdata.2022.945720/full

canimus commented 4 months ago

@jbytecode the problem is that when passing through the bot verification it qualifies it as INVALID

jbytecode commented 4 months ago

@canimus - 10.3389/fdata.2020.564115 was not passing, but 10.3389/fdata.2022.945720 is resolved. Am I wrong? You can add the same bibtex entry with the latest DOI, right?

canimus commented 4 months ago

@editorialbot check references

editorialbot commented 4 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3603707 is OK
- 10.1145/3603706 is OK
- 10.1145/3580305.3599776 is OK
- 10.14778/3229863.3229867 is OK
- 10.1145/3529190.3529222 is OK
- 10.1145/2723372.2742797 is OK
- 10.1145/2872427.2883029 is OK
- 10.3389/fdata.2022.945720 is OK

MISSING DOIs

- No DOI given, and none found for title: Technology Trends for 2023
- No DOI given, and none found for title: cuallee: Performance Tests
- No DOI given, and none found for title: TLC Trip Record Data
- 10.5040/9781350207318.00000004 may be a valid DOI for title: Great Expectations
- No DOI given, and none found for title: Soda Core

INVALID DOIs

- None
canimus commented 4 months ago

@jbytecode thanks for the correction, using the newest reference on 2022 fixed the resolution. All missing ones, are just internal references, articles and misc.

jbytecode commented 4 months ago

@canimus - It seems we solved the DOI issue. Now I am generating the latest pdf and I'll have a final proof-reading.

jbytecode commented 4 months ago

@editorialbot generate pdf

editorialbot commented 4 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jbytecode commented 4 months ago

@canimus - Please read the manuscript carefully and fix things if you encounter any errors/typos (I am also doing the same stuff). Ping me when you have done with it. Thank you in advance.

canimus commented 4 months ago

@jbytecode I downloaded the latest PDF version, and I am confident that is now good to go. It is free from typos, and readability score is good. Thank you very much for all your support, assistance and diligence, truly appreciate it.

jbytecode commented 4 months ago

@canimus - It's also okay from my side. I am now recommending an acceptance. Our track editor will make the final decision.