Closed whedon closed 4 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @Fil, @atanikan it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Reference check summary:
OK DOIs
- 10.1109/MEDITEC.2016.7835390 is OK
- 10.1145/358027.358048 is OK
- 10.1109/SEQUEN.1997.666900 is OK
- 10.1145/1963405.1963487 is OK
MISSING DOIs
- https://doi.org/10.1007/3-540-45735-6_24 may be missing for title: String matching with metric trees using an approximate distance
- https://doi.org/10.1145/363958.363994 may be missing for title: A technique for computer detection and correction of spelling errors
- https://doi.org/10.2307/1932409 may be missing for title: Measures of the amount of ecologic association between species
- https://doi.org/10.1080/00182494.1976.10112645 may be missing for title: Surname spellings and computerized record linkage
- https://doi.org/10.1111/j.1469-8137.1912.tb05611.x may be missing for title: The distribution of the flora in the alpine zone. 1
- https://doi.org/10.1080/01621459.1989.10478785 may be missing for title: Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida
- https://doi.org/10.1002/sim.4780140510 may be missing for title: Probabilistic linkage of large public health data files
- https://doi.org/10.6028/nbs.sp.500-2 may be missing for title: Accessing individual records from personal data files using non-unique identifiers
- https://doi.org/10.1016/0022-2836(81)90087-5 may be missing for title: Identification of common molecular subsequences
- https://doi.org/10.2307/1532326 may be missing for title: FONEM: Un code de transcription phonétique pour la reconstitution automatique des familles saguenayennes
- https://doi.org/10.1109/itcc.2002.1000354 may be missing for title: Improving precision and recall for soundex retrieval
- https://doi.org/10.1145/2348828.2348830 may be missing for title: Hybrid matching algorithm for personal names
- https://doi.org/10.1002/(sici)1097-4571(199301)44:1<1::aid-asi1>3.0.co;2-1 may be missing for title: Stemming of French words based on grammatical categories
- https://doi.org/10.1002/(sici)1097-4571(1999)50:10<944::aid-asi9>3.0.co;2-q may be missing for title: A stemming procedure and stopword list for general French corpora
- https://doi.org/10.1108/eb026966 may be missing for title: A stemming algorithm for Latin text databases
- https://doi.org/10.1145/101306.101310 may be missing for title: Another stemmer
- https://doi.org/10.1162/coli.2006.32.4.485 may be missing for title: Unsupervised multilingual sentence boundary detection
INVALID DOIs
- None
@whedon re-invite @atanikan as reviewer
@atanikan already has access.
👋 @Fil many thanks for your comments - @yomguithereal can you let us know once you have addressed @Fil 's comments?
@atanikan let me know if you need any help with the reviewing process - the review criteria you need to follow are outlined here: https://joss.readthedocs.io/en/latest/review_criteria.html and a useful checklist is above (with comments and explanations here: https://joss.readthedocs.io/en/latest/review_checklist.html) feel free to check the boxes directly!
Many thanks all for all your hard work in improving the library and the paper,
@Yomguithereal can you let us know once you have addressed @Fil 's comments?
Sure, I will tell you when this is done. I first need to find a little bit of time to do so, hope there is no time limit on this :) Thanks for reviewing the library.
@kakiac Thanks for the reminder :)
@Yomguithereal The installation works. I checked some of your functions and they work well. Few minor suggestions,
a) Personally I'd find it more intuitive to do
var talis = require('talisman'); talis.metrics.cosine([1,2],[3,4]);
over
var cos = require('talisman/metrics/cosine'); cos([1,2],[3,4]);
I guess the issue does speak to it.
b) A comparative benchmark study as mentioned here which can also be added to your documentation website would be super useful.
c) Mentioning somewhere in your documentation examples using "require" as well as ES6 "imports" would be useful for beginners
I guess the issue does speak to it.
Yes the issue is indeed about this point. The reason why this was implemented thusly, in the beginning, is because it was written at a time where tree-shaking bundler were not yet a thing and since the library is really heavy on its own, it was the only way to make sure people would only require pieces of it on a need-to basis. But I agree this is somewhat unnecessary when using the library in node.js
I will add a centralized endpoint in a near future but I need some time to be sure I can do so while supporting ES6 imports, CommonJS & TypeScript which is unfortunately always a drag.
A comparative benchmark study as mentioned here which can also be added to your documentation website would be super useful.
This issue (imprecisely) refer to the fact that you can use the library to benchmark several methods such as phonetic encoders against a database of, say, given names. I was not refering to performance benchmark against other similar libraries. Is that what you had in mind?
I will add a centralized endpoint in a near future but I need some time to be sure I can do so while supporting ES6 imports, CommonJS & TypeScript which is unfortunately always a drag.
That's fair! I guess we can pass on it for now.
This issue (imprecisely) refer to the fact that you can use the library to benchmark several methods such as phonetic encoders against a database of, say, given names. I was not referring to performance benchmark against other similar libraries. Is that what you had in mind?
Then yes, I meant performance benchmarking with other similar libraries. I understand Talisman aims to be an archive/collection of several text-based search/information retrieval techniques. And not exactly a brand new algorithm. I however feel some benchmarking, be it in terms of accuracy or speed of execution by taking a large dataset as an example, may prove to be useful for wider use and a greater acceptance of your library.
Thanks all!
@Yomguithereal, re this:
hope there is no time limit on this :)
No, please take your time to implement the changes as suggested by the reviewers - once you are happy (and the reviewers are happy 😃 ) we can move the submission forward to the next step!
👋 hi @Yomguithereal
It looks like you needed time to implement reviewer-requested mods. Can we have an update on the progress? If you need more time, just let us know—times are tough!
Hello @labarba. Sorry for the delay. I haven't progressed much but I should be able to find time to do so in september. Times indeed are tough :). Thanks for enquiring.
@whedon remind @Yomguithereal in two weeks
Reminder set for @Yomguithereal in two weeks
:wave: @Yomguithereal, please update us on how things are progressing here.
👋 @Yomguithereal - hope you are well! I am just following up to see how things are progressing with implementing the reviewer's suggestions? Let me know if there is something I can do to help our end :) Hope you are well 👍
Hello @kakiac. I should have some time thursday to act on those suggestions. So far I need to update the paper to change its framing a bit, add some references to similar tools and switch to a leaner bibliography (as opposed to the one containing all of the papers describing the implemented functions and methods). I also have some one related to the library's packaging. I would like to address those but later since it is very time consuming.
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@atanikan I have added a note regarding module imports and require
here.
Regarding:
Then yes, I meant performance benchmarking with other similar libraries. I understand Talisman aims to be an archive/collection of several text-based search/information retrieval techniques. And not exactly a brand new algorithm. I however feel some benchmarking, be it in terms of accuracy or speed of execution by taking a large dataset as an example, may prove to be useful for wider use and a greater acceptance of your library.
There exists some speed benchmarks regarding some functions of the library (notably Levenshtein distance, because this library was used as an experimental starting point for the creation of many very fast implementation of the distance in JavaScript) but for most of the implemented methods there is nothing to benchmark against because nobody implemented those functions.
Regarding precision/recall etc. the issue here is that the relevant databases that could enable such benchmarks don't exist yet because most of those methods were created as heuristics geared toward the resolution of particular problems and so it made no sense to most of their authors to compare them to others. This said, I believe there is a point to be made regarding some classes of functions present in the library such as phonetic algorithms geared towards proper names and their precision/recall. I personally intend to develop such a benchmark and I have started created a database that could be used for this purpose (along with another phonetic algorithm I am developing right now), but unfortunately this is an ongoing work which won't be finished anywhere soon.
This said, do you want me to, at least, try to compile some speed benchmarks regarding a subset of the library's functions that have some other JavaScript implementations?
My review is complete. Thank you @Yomguithereal!
This said, do you want me to, at least, try to compile some speed benchmarks regarding a subset of the library's functions that have some other JavaScript implementations?
It's fine. I see the article does speak to the fact that one can benchmark if they would like. Since the article nor the library makes an outright claim to have the "fastest"/"best". I suppose we can ignore benchmarking stats for now. Appreciate the changes made in the documentation as well. @kakiac My review is complete. Thanks, @Yomguithereal. Fantastic job.
@atanikan @Fil @Yomguithereal thank you so much for all your efforts and hard work on this! A fantastic submission :) I will now be initiating final checks and push through the last editorial steps for publication.
Editor's "After reviewers recommend acceptance" checks:
@whedon generate pdf
@whedon check references
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1007/978-3-642-31164-2 is OK
MISSING DOIs
- 10.1111/j.1469-8137.1912.tb05611.x may be a valid DOI for title: The distribution of the flora in the alpine zone. 1
- 10.1007/0-387-69505-2 may be a valid DOI for title: Data quality and record linkage techniques
INVALID DOIs
- None
I pushed a commit adding the missing DOIs to the paper's bibliography.
Thanks @Yomguithereal!
I have just finished with all the checks for this submission - if you could merge my pull request here: https://github.com/Yomguithereal/talisman/pull/177 with the main .bib file, that should work. The rest of the references were either very old and without a DOI or without a permanent identifier, so happy to leave as they are.
After you do this, can you please make a tagged release and archive the repo, and report the version number and archive DOI in this thread?
Not long now!
Sorry, what do you mean by "archive the repo"?
@Yomguithereal apologies - I should have been clearer - Upon successful completion of the review, you need to make a tagged release of the software (via github, see here for info https://git-scm.com/book/en/v2/Git-Basics-Tagging), and deposit a copy of the repository with a data-archiving service such as Zenodo or figshare.
You then need to get the DOI you will be assigned from them (it is automatically assigned once you create a repository).
As soon as you have both copy the release version number (from github) and the DOI in this thread.
See here for more info under section The review process https://joss.readthedocs.io/en/latest/submitting.html#submitting-your-paper - let me know if you need any help!
An example of a recent tagged release on github is here: https://github.com/kgoldfeld/simstudy/ archived on Zenodo with the DOI: https://doi.org/10.5281/zenodo.4134675
@kakiac Please wait to ping openjournals/joss-eics until a submission is ready to be taken over, if possible. Just saves a little bit of email traffic for us. Your last checkbox isn't needed since that happens with @whedon accept
.
@kakiac the release is 1.1.3-zenodo
and the attached DOI is 10.5281/zenodo.4247869
(but the https://doi.org/10.5281/zenodo.4247869 url does not yet work as I write this message. I guess it is just Zenodo/DOI lag?).
Yes, no worries it worked for me just now, I think it takes some time to get registered in the doi database, I think. Thanks for doing that!
@Yomguithereal it seems that the authors list of your JOSS paper and the zenodo archive does not match. Can you please make sure they align?
@kakiac Please wait to ping openjournals/joss-eics until a submission is ready to be taken over, if possible. Just saves a little bit of email traffic for us. Your last checkbox isn't needed since that happens with
@whedon accept
.
Oh apologies! I thought I needed to get that tagged if accepted by reviewers! Will bear that in mind in the future!
@kakiac Your list looks correct now! Thanks!
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
@whedon set 1.1.3-zenodo as version
OK. 1.1.3-zenodo is the version.
@Yomguithereal not sure if you spotted my message above - it seems that the authors list of your JOSS paper and the zenodo archive does not match. Can you please make sure they align?
@whedon set 10.5281/zenodo.4247869 as archive
OK. 10.5281/zenodo.4247869 is the archive.
Is this ok now?
Yes, thank you :)
@whedon accept
Submitting author: @Yomguithereal (Guillaume Plique) Repository: https://github.com/Yomguithereal/talisman Version: 1.1.3-zenodo Editor: @kakiac Reviewer: @Fil, @atanikan Archive: 10.5281/zenodo.4247869
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@Fil & @atanikan, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @kakiac know.
✨ Please try and complete your review in the next six weeks ✨
Review checklist for @Fil
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @atanikan
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper