Closed editorialbot closed 9 months ago
Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.
For a list of things I can do to help you, just type:
@editorialbot commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@editorialbot generate pdf
Software report:
github.com/AlDanial/cloc v 1.88 T=0.04 s (576.5 files/s, 94705.0 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
C 2 253 92 1246
Python 6 151 21 484
Markdown 4 69 0 259
C/C++ Header 2 74 96 238
YAML 5 11 21 195
TeX 1 21 0 177
TOML 1 4 0 38
-------------------------------------------------------------------------------
SUM: 21 583 230 2637
-------------------------------------------------------------------------------
gitinspector failed to run statistical information for the repository
Wordcount for paper.md
is 932
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.18653/v1/2020.findings-emnlp.445 is OK
- 10.5281/zenodo.7882688 is OK
- 10.1145/362686.362692 is OK
- 10.1109/SEQUEN.1997.666900 is OK
MISSING DOIs
- 10.1145/1553374.1553516 may be a valid DOI for title: Feature Hashing for Large Scale Multitask Learning
- 10.1201/9781003080046 may be a valid DOI for title: Probabilistic Data Structures for Blockchain-Based Internet of Things Applications
INVALID DOIs
- https://doi.org/10.1016/j.cose.2021.102209 is INVALID because of 'https://doi.org/' prefix
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
👋 @hajimes - thanks for your submission to JOSS. Can you explain a bit about what part of this is your contribution vs what is the original MurmurHash3 from by Austin Appleby? I assume all the Python code is yours, but am unsure about the C code.
In addition, you could work on the possibly missing and incorrect DOIs that editorialbot suggests, but note that some may be incorrect. Please feel free to make changes to your .bib file, then use the command @editorialbot check references
to check again, and the command @editorialbot generate pdf
when the references are right to make a new PDF. editorialbot commands need to be the first entry in a new comment.
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/j.cose.2021.102209 is OK
- 10.1145/362686.362692 is OK
- 10.1109/SEQUEN.1997.666900 is OK
- 10.5281/zenodo.7882688 is OK
- 10.18653/v1/2020.findings-emnlp.445 is OK
- 10.1201/9781003080046 is OK
- 10.1145/1553374.1553516 is OK
MISSING DOIs
- None
INVALID DOIs
- 10.5555/1577069.1755873 is INVALID
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/j.cose.2021.102209 is OK
- 10.1145/362686.362692 is OK
- 10.1109/SEQUEN.1997.666900 is OK
- 10.5281/zenodo.7882688 is OK
- 10.18653/v1/2020.findings-emnlp.445 is OK
- 10.1201/9781003080046 is OK
- 10.1145/1553374.1553516 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@danielskatz - thanks for managing my submission.
As for contributions about the C code,
mmh3module.c
(1051 loc): reviewers can assume this file is basically my own contribution. It contains hasher objects (that allow incremental updating using inner buffers), as well as wrappers (interfaces) to simple hash functions. I also made a great effort to make the original library portable to be used in various platforms, even though the visible results of this effort are tiny in terms of lines.
murmurhash3.c
(287 loc) and murmurhash3.h
(253 loc) contain the core logic of hashing. While the attribution of these two files should be almost entirely directed to Austin Appleby, as he invented the algorithms of MurmurHash3, I also made several tweaks to these code (e.g., conversion from C++ to C99, support for big-endian platforms, and addition of helper functions for hashers).
hashlib.h
(81 loc) was just taken from the repository of CPython (https://github.com/python/cpython).
Hello! Thanks for citing https://github.com/hugovk/top-pypi-packages!
Please could you adjust my name? "Van Kemenade" is my surname:
-29 that only 0.23% of the remaining packages in the PyPI ecosystem are more popular (Kemenade
+29 that only 0.23% of the remaining packages in the PyPI ecosystem are more popular (Van Kemenade
30 et al., 2023). According to PePy, as of May 22, 2023, the total downloads of this
-101 Kemenade, H. van, Si, R., & Dollenstein, Z. (2023). Hugovk/top-pypi-packages: Release
+101 Van Kemenade, H., Si, R., & Dollenstein, Z. (2023). Hugovk/top-pypi-packages: Release
102 2023.05 (Version 2023.05). Zenodo. https://doi.org/10.5281/zenodo.7882688
Thank you!
@hajimes - thanks for your reply. Due to the relatively small amount of code, the editors will now discuss if it meets the substantial scholarly effort criterion for review by JOSS. You should hear back in a week or so.
Also, you could make the change mentioned above by @hugovk
@editorialbot query scope
Submission flagged for editorial review.
@editorialbot check references
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1016/j.cose.2021.102209 is OK
- 10.1145/362686.362692 is OK
- 10.1109/SEQUEN.1997.666900 is OK
- 10.5281/zenodo.7994944 is OK
- 10.18653/v1/2020.findings-emnlp.445 is OK
- 10.1201/9781003080046 is OK
- 10.1145/1553374.1553516 is OK
MISSING DOIs
- None
INVALID DOIs
- None
@editorialbot generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@hugovk Hello! I apologize for writing your name incorrectly. I just blindly used the BibTeX entry provided by Zenodo, but I should have checked the entry before citation.
Anyway, thank you so much for providing valuable statistics!
@danielskatz Thank you for your reply and reminder. I fixed the name of @hugovk and also updated lines 27-31 from
As of May 1, 2023, mmh3 was being downloaded more than 2 million times per month, and it ranks as the 1,027th most downloaded PyPI package (of around 450,000 projects), showing that only 0.23% of the remaining packages in the PyPI ecosystem are more popular (Kemenade et al., 2023). According to PePy, as of May 22, 2023, the total downloads of this library exceeded 78 millions.
to
As of June 1, 2023, mmh3 was being downloaded more than 2 million times per month, and it ranks as the 970th most downloaded PyPI package (of around 458,000 projects), showing that only 0.22% of the remaining packages in the PyPI ecosystem are more popular (Van Kemenade et al., 2023). According to PePy, as of June 2, 2023, the total downloads of this library exceeded 79 millions.
No problem, thank you for updating, and I'm happy the statistics were useful!
@hajimes - Should Austin Appleby also be an author of this submission?
Perhaps not? (However, if you think he should be, I'll ask Austin Appleby whether he can join as a co-author)
I mean, it's true that the current mmh3 repository contains some code that was ultimately derived from Austin Appleby's original SMHasher library (i.e., murmurhash3.c
and murmurhash3.h
which were derived from MurmurHash3.cpp
and MurmurHash3.h
, respectively)
But probably a better solution is to exclude these files, use a git submodule to refer to Appleby's repo, and then write a script that converts the original C++ files to more portable C code at compile time. In this way I'll be able to clarify the contribution of each engineer, as well as increase the total amount of my own code.
If you think it's a good idea, I'll refactor mmh3 in a few days or so, though I'm not sure updates after submission are subject to evaluation by the review committee.
@hajimes - please go ahead and do this, then we'll review it
👋 @hajimes - note that I'm waiting for your response
I sincerely apologize for my late response.
I'll try to implement the modified version of my library soon, hopefully tomorrow, at least within this week.
@hajimes - Can you let us know here where this is now? I'm unsure from your issue above if you are done or still working, and if you are done, you could summarize here what you did and how things have changed from the POV of understanding your contributions and what JOSS will review.
I am very sorry for the repeated delay. As a progress report, I uploaded the result of refactoring to the feature/refactor-submodule branch that should be near-final, where ./src/mmh3/_mmh3/refresh.py handles the conversion of the original code by Austin Appleby.
I'll recheck my code later and merge it into the master branch, and then explain the changes in detail.
Again, so sorry for causing you trouble.
I finally uploaded the update as v4.0.1 yesterday.
Files that are mostly dependent on the original now reside in https://github.com/hajimes/mmh3/tree/master/src/mmh3/_mmh3
The project structure is roughly based on the _hacl
module in CPtyhon's repo, which is used by hashlib
, a part of the Python standard library.
https://github.com/python/cpython/tree/main/Modules/_hacl
That is,
refresh.py
(that is my contribution) transforms original files into new forms to be used by this project.murmurhash3.c
and murmurhash3.h
) stay as a part of my repository, rather than are treated as temporary files during the build time. I chose this design because mmh3module.c
depends on murmurhash3.h
.Each transform is represented by a string-to-string function, and the building process of new files is represented by a list of code fragments each of which is accompanied by these transforms.
The transforms can be loosely classified into four categories:
append_python_directives
, introduce_py_ssize_t
, ...)append_byteswap_header
, transform_getblocks
, expand_win_stdint_typedefs
, `fix_non_win_force_inline``, ...)mmh3module.c
(e.g., append_mur_macros
, generate_hasher_digest_*
, ...)I am really thankful to editors and reviewers for giving me an opportunity to rethink the structure of my project.
@editorialbot set v4.0.1 as version
Done! version is now v4.0.1
👋 @drvinceknight - can you edit this JOSS submission?
@editorialbot invite @drvinceknight as editor
Invitation to edit this submission sent!
@editorialbot set me as editor
I'm sorry human, I don't understand that. You can see what commands I support by typing:
@editorialbot commands
@editorialbot assign me as editor
Assigned! @drvinceknight is now the editor
@hajimes - I'll be taking this submission on as your editor. Could you take a look a this list of potential reviewers and identify a few people who would be good candidates to review this submission?
@drvinceknight - Thank you, I'm glad you took the time to be my editor!
Considering programming languages (Python and C/C++) and topic areas (high performance computing, natural language processing, machine learning, internet of things, cybersecurity, etc.), I'd suggest the following people as good candidates to review my paper:
@proycon @mrshu @jarrah42 would any of you be willing to review this submission for JOSS? The submission under consideration is mmh3: A Python extension for MurmurHash3.
The review process at JOSS is unique: it takes place in a GitHub issue, is open, and author-reviewer-editor conversations are encouraged. You can learn more about the process in these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html
Based on your experience, we think you might be able to provide a great review of this submission. Please let me know if you think you can help us out!
I would be happy to help with the review.
Thank you @mrshu and thank you for your patience everyone.
@editorialbot add @mrshu as reviewer
@mrshu added to the reviewers list!
@innat @sergioburdisso would either of you be willing to review this submission for JOSS? The submission under consideration is mmh3: A Python extension for MurmurHash3.
The review process at JOSS is unique: it takes place in a GitHub issue, is open, and author-reviewer-editor conversations are encouraged. You can learn more about the process in these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html
Based on your experience, we think you might be able to provide a great review of this submission. Please let me know if you think you can help us out!
@drvinceknight - are you able to continue looking for reviewers?
Submitting author: !--author-handle-->@hajimes<!--end-author-handle-- (Hajime Senuma) Repository: https://github.com/hajimes/mmh3 Branch with paper.md (empty if default branch): paper Version: v4.0.1 Editor: !--editor-->@drvinceknight<!--end-editor-- Reviewers: @mrshu, @JPenuchot Managing EiC: Daniel S. Katz
Status
Status badge code:
Author instructions
Thanks for submitting your paper to JOSS @hajimes. Currently, there isn't a JOSS editor assigned to your paper.
@hajimes if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.
Editor instructions
The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type: