openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[PRE REVIEW]: Fast-ER: GPU-Accelerated Record Linkage and Deduplication in Python #7470

Open editorialbot opened 1 week ago

editorialbot commented 1 week ago

Submitting author: !--author-handle-->@jacobmorrier<!--end-author-handle-- (Jacob Morrier) Repository: https://github.com/jacobmorrier/fast-er Branch with paper.md (empty if default branch): Version: v0.1.1 Editor: Pending Reviewers: Pending Managing EiC: Samuel Forbes

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/9b59c50b6cac074605f908fda22cdeb6"><img src="https://joss.theoj.org/papers/9b59c50b6cac074605f908fda22cdeb6/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/9b59c50b6cac074605f908fda22cdeb6/status.svg)](https://joss.theoj.org/papers/9b59c50b6cac074605f908fda22cdeb6)

Author instructions

Thanks for submitting your paper to JOSS @jacobmorrier. Currently, there isn't a JOSS editor assigned to your paper.

@jacobmorrier if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands
editorialbot commented 1 week ago

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 week ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.06 s (511.9 files/s, 99071.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
SVG                              5              0             65           3163
Python                           9            666            811            625
TeX                              1             12              0            107
YAML                             3             11              9             87
reStructuredText                 9            118            104             85
Markdown                         2             30              0             52
DOS Batch                        1              8              1             26
make                             1              4              7              9
-------------------------------------------------------------------------------
SUM:                            31            849            997           4154
-------------------------------------------------------------------------------

Commit count by author:

   391  Jacob Morrier
    67  jacobmorrier
    17  sukishore12
     3  Sulekha Kishore
editorialbot commented 1 week ago

Paper file info:

📄 Wordcount for paper.md is 1067

✅ The paper includes a Statement of need section

editorialbot commented 1 week ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 1 week ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.23889/ijpds.v7i3.1794 is OK
- 10.1017/S0003055418000783 is OK
- 10.1111/joes.12395 is OK
- 10.1146/annurev-soc-073117-041447 is OK
- 10.1146/annurev-publhealth-031210-100700 is OK
- 10.1017/S0003055420000556 is OK
- 10.1177/1532673X19870512 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: CuPy: A NumPy-Compatible Library for NVIDIA GPU Ca...
- No DOI given, and none found for title: Arrow Columnar Format
- No DOI given, and none found for title: String Comparator Metrics and Enhanced Decision Ru...
- No DOI given, and none found for title: RAPIDS: Libraries for End to End GPU Data Science
- No DOI given, and none found for title: Programming Massively Parallel Processors: A Hands...

❌ MISSING DOIs

- 10.32614/cran.package.fastlink may be a valid DOI for title: fastLink: Fast Probabilistic Record Linkage with M...

❌ INVALID DOIs

- None
editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 week ago

Five most similar historical JOSS papers:

Fast k-medoids Clustering in Rust and Python Submitting author: @kno10 Handling editor: @mikldk (Retired) Reviewers: @timClicks, @TahiriNadia Similarity score: 0.6555

Groupyr: Sparse Group Lasso in Python Submitting author: @richford Handling editor: @galessiorob (Active) Reviewers: @janfreyberg, @JonathanReardon, @rougier Similarity score: 0.6517

Elephas: Distributed Deep Learning with Keras & Spark Submitting author: @maxpumperla Handling editor: @diehlpk (Active) Reviewers: @sepandhaghighi, @nmoran Similarity score: 0.6474

Zoomerjoin: Superlatively-Fast Fuzzy Joins Submitting author: @beniaminogreen Handling editor: @samhforbes (Active) Reviewers: @cjbarrie, @wincowgerDEV Similarity score: 0.6451

The SAGE Rejected Article Tracker Submitting author: @ad48 Handling editor: @danielskatz (Active) Reviewers: @mfenner, @dhimmel Similarity score: 0.6439

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

samhforbes commented 1 week ago

Hi @jacobmorrier thanks for submitting to JOSS. Given the relatively small size of the package in terms of python code, this will be discussed for scope by our editors.

samhforbes commented 1 week ago

@editorialbot query scope

editorialbot commented 1 week ago

Submission flagged for editorial review.

jacobmorrier commented 1 week ago

Hi! Thank you for considering our submission. While it’s being reviewed for scope, I’d like to suggest two potential reviewers from the directory whose expertise includes record linkage and entity resolution: KonradHoeffner and vaneseltine. Additionally, I believe it would be beneficial to include at least one reviewer familiar with CUDA.

jacobmorrier commented 1 week ago

I would like to point out that some CUDA code is not included in the software report because it is stored as strings in Python scripts and compiled using CuPy.

samhforbes commented 1 week ago

Hi @jacobmorrier thanks for the additional context. We will come back to you shortly.