openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
700 stars 36 forks source link

[REVIEW]: DeBIR: A Python Package for Dense Bi-Encoder Information Retrieval #5017

Closed editorialbot closed 1 year ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@ayuei<!--end-author-handle-- (Vincent Nguyen) Repository: https://github.com/Ayuei/DeBEIR Branch with paper.md (empty if default branch): paper Version: v0.0.1 Editor: !--editor-->@arfon<!--end-editor-- Reviewers: @KonradHoeffner, @amitkumarj441 Archive: 10.5281/zenodo.8103783

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07"><img src="https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07/status.svg)](https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@KonradHoeffner, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

πŸ“ Checklist for @KonradHoeffner

πŸ“ Checklist for @amitkumarj441

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

KonradHoeffner commented 1 year ago

Furthermore, our pipeline runs asynchronously to reduce I/O performance bottlenecks, facilitating faster experiments and research

@Ayuei To check the "performance" box in the review I need some kind of proof for that statement, do you have a benchmark or some kind of measurement to confirm that it is faster than the competitors?

arfon commented 1 year ago

:wave: @Ayuei – @KonradHoeffner is waiting on your input here.

arfon commented 1 year ago

@KonradHoeffner @Ayuei – I'm going to need to find a second reviewer here. Do either of you have suggestions for who might be able to provide a second review here?

KonradHoeffner commented 1 year ago

Unfortunately I do not have a suggestion for a suitable reviewer replacement.

Ayuei commented 1 year ago

@KonradHoeffner @Ayuei – I'm going to need to find a second reviewer here. Do either of you have suggestions for who might be able to provide a second review here?

I don't have any suggestions. But a reviewer with an IR background and Python as their programming language is ideal.

arfon commented 1 year ago

:wave: @lemuria-wchen – would you be willing to review this submission for JOSS? The submission under consideration is DeBIR: A Python Package for Dense Bi-Encoder Information Retrieval, a Python package for information retrieval and natural language processing.

The review process at JOSS is unique: it takes place in a GitHub issue, is open, and author-reviewer-editor conversations are encouraged. You can learn more about the process in these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

Based on your experience, we think you might be able to provide a great review of this submission. Please let me know if you think you can help us out!

Many thanks Arfon

Ayuei commented 1 year ago

@Ayuei To check the "performance" box in the review I need some kind of proof for that statement, do you have a benchmark or some kind of measurement to confirm that it is faster than the competitors?

I added a benchmark to justify the statement that our async pipeline is much faster than a synchronous one Link to benchmark. In summary, it's a 9-10x speedup.

Otherwise, all expensive embedding calls are immediately cached (so subsequent calls are I/O bound rather than GPU bound) which is what is tested in test_pipeline:test_query_encoder_cache in (test_pipeline.py)[https://github.com/Ayuei/DeBEIR/blob/main/tests/test_pipeline.py] and also benchmarked in Link to benchmark. We believe that this is valuable for researchers as experiments are often repeated several times.

KonradHoeffner commented 1 year ago

@arfon @Ayuei: I'm finished with my review now and checked all the boxes. However I did not know about the field of dense retrieval before, so the second reviewer should make sure to separately check especially whether all functional claims have been confirmed because I can only verify them to my limited understanding of the subject.

KonradHoeffner commented 1 year ago

Additional proof reading comments:

arfon commented 1 year ago

πŸ™Œ thanks for much for the detailed review @KonradHoeffner.

arfon commented 1 year ago

@editorialbot remove @kuutsav as reviewer

editorialbot commented 1 year ago

@kuutsav removed from the reviewers list!

Ayuei commented 1 year ago

Hi @arfon, is there anything I can do to speed the process?

I had a look again at the reviewers list, and it seems that the initial list of candidates I proposed during the pre-review are still relevant for the submission.

arfon commented 1 year ago

I had a look again at the reviewers list, and it seems that the initial list of candidates I proposed during the pre-review are still relevant for the submission.

All of the reviewers you suggested have been invited. I can try pinging them again but if you're able to suggest people you know in your research field that's probably better (provided they're not conflicted).

amitkumarj441 commented 1 year ago

I had a look again at the reviewers list, and it seems that the initial list of candidates I proposed during the pre-review are still relevant for the submission.

All of the reviewers you suggested have been invited. I can try pinging them again but if you're able to suggest people you know in your research field that's probably better (provided they're not conflicted).

Hi @arfon, I am available to review this IR paper/codebase and hoping to get it done by the end of this week. Let me know if you're still open to get this reviewed. Thanks!

Ayuei commented 1 year ago

@arfon Pinging just in case you didn't see @amitkumarj441's message.

arfon commented 1 year ago

Hi @arfon, I am available to review this IR paper/codebase and hoping to get it done by the end of this week. Let me know if you're still open to get this reviewed. Thanks!

Amazing, thank you so much @amitkumarj441 (@Ayuei I did miss this sorry 🀦 ). @amitkumarj441 – I'll go ahead and add you as a reviewer now πŸ‘

arfon commented 1 year ago

@editorialbot add @amitkumarj441 as reviewer

editorialbot commented 1 year ago

@amitkumarj441 added to the reviewers list!

arfon commented 1 year ago

@amitkumarj441 – This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above. Please create your checklist typing:

@editorialbot generate my checklist

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5017 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

amitkumarj441 commented 1 year ago

Review checklist for @amitkumarj441

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

arfon commented 1 year ago

:wave: @amitkumarj441 – just checking in to see how things are going with your review?

amitkumarj441 commented 1 year ago

I am back to reviewing the codebase and was able to run the Python tests without a hitch, given this library has support enabled for both ES and Solr which works in a standalone manner. However, the automated tests pipeline via Docker runs on MacOs which is not platform-agnostic, though, you may add a bit about instructions on how to run this dockerized image for indexing on Linux. The main pros of this library for reproducing a similar case in IR is the support of an extended list of metrics (NDCG@k, p@k, r@k, etc.). Apart from this, the documentation should clarify on the timeout for ES such as L16.

Ayuei commented 1 year ago

I am back to reviewing the codebase and was able to run the Python tests without a hitch, given this library has support enabled for both ES and Solr which works in a standalone manner. However, the automated tests pipeline via Docker runs on MacOs which is not platform-agnostic, though, you may add a bit about instructions on how to run this dockerized image for indexing on Linux. The main pros of this library for reproducing a similar case in IR is the support of an extended list of metrics (NDCG@k, p@k, r@k, etc.). Apart from this, the documentation should clarify on the timeout for ES such as L16.

Thank you for your review @amitkumarj441,

I added more documentation about the ES config in commit b25dd9

I think the testing should be platform agnostic on the main branch (https://github.com/Ayuei/DeBEIR/blob/main/tests/build_test_env.sh), which has seen major refactoring since the paper submission.

amitkumarj441 commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1145/3336191.3371864 is OK

MISSING DOIs

- 10.1145/3065386 may be a valid DOI for title: ImageNet Classification with Deep Convolutional Neural Networks
- 10.1162/coli_r_00468 may be a valid DOI for title: Pretrained transformers for text ranking: Bert and beyond

INVALID DOIs

- https://doi.org/10.1016/j.jbi.2022.104005 is INVALID because of 'https://doi.org/' prefix
amitkumarj441 commented 1 year ago

@Ayuei That's helpful to run tests, however, under the main branch in the central instructions file (README.md), you should fix the typo of the shell script file as below

cd tests/
./build_test.env.sh -> ./build_test_env.sh
pytest .

Also, fix the missing and erroneous references as listed above. I am happy to recommend the paper for publication at this stage provided that the above suggestions are taken into consideration.

Ayuei commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3065386 is OK
- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.1145/3331184.3331340 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1016/j.jbi.2022.104005 is OK
- 10.1145/3336191.3371864 is OK
- 10.1162/coli_r_00468 is OK

MISSING DOIs

- None

INVALID DOIs

- None
Ayuei commented 1 year ago

@Ayuei That's helpful to run tests, however, under the main branch in the central instructions file (README.md), you should fix the typo of the shell script file as below

cd tests/
./build_test.env.sh -> ./build_test_env.sh
pytest .

Also, fix the missing and erroneous references as listed above. I am happy to recommend the paper for publication at this stage provided that the above suggestions are taken into consideration.

@amitkumarj441 Thanks for catching this!

I just fixed this in the main and paper branches. I also fixed the missing and erroneous DOIs (Commit https://github.com/Ayuei/DeBEIR/commit/8a9714aed769431aefdd12d1d4eb929cc9682318).

amitkumarj441 commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3065386 is OK
- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.1145/3331184.3331340 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1016/j.jbi.2022.104005 is OK
- 10.1145/3336191.3371864 is OK
- 10.1162/coli_r_00468 is OK

MISSING DOIs

- None

INVALID DOIs

- None
amitkumarj441 commented 1 year ago

@arfon I think it's much improved, and I'm happy to sign-off the full review.

arfon commented 1 year ago

@Ayuei – At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

I can then move forward with accepting the submission.

Ayuei commented 1 year ago

@Ayuei – At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

  • The title of the archive is the same as the JOSS paper title
  • That the authors of the archive are the same as the JOSS paper authors
  • Finally, please make sure you give the paper a final proof read yourself before I do.

I can then move forward with accepting the submission.

Hi @arfon,

Thank you for the detailed instructions

If there is anything else I need to do, please let me know.

arfon commented 1 year ago

@editorialbot set 10.5281/zenodo.8103783 as archive

editorialbot commented 1 year ago

Done! archive is now 10.5281/zenodo.8103783

arfon commented 1 year ago

@editorialbot recommend-accept

editorialbot commented 1 year ago
Attempting dry run of processing paper acceptance...
editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3065386 is OK
- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.1145/3331184.3331340 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1016/j.jbi.2022.104005 is OK
- 10.1145/3336191.3371864 is OK
- 10.1162/coli_r_00468 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

:wave: @openjournals/dsais-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/4374, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

arfon commented 1 year ago

@editorialbot accept

editorialbot commented 1 year ago
Doing it live! Attempting automated processing of paper acceptance...
editorialbot commented 1 year ago

Ensure proper citation by uploading a plain text CITATION.cff file to the default branch of your repository.

If using GitHub, a Cite this repository menu will appear in the About section, containing both APA and BibTeX formats. When exported to Zotero using a browser plugin, Zotero will automatically create an entry using the information contained in the .cff file.

You can copy the contents for your CITATION.cff file here:

CITATION.cff

``` cff-version: "1.2.0" authors: - family-names: Nguyen given-names: Vincent orcid: "https://orcid.org/0000-0003-1787-8090" - family-names: Karimi given-names: Sarvnaz orcid: "https://orcid.org/0000-0002-4927-3937" - family-names: Xing given-names: Zhenchang orcid: "https://orcid.org/0000-0001-7663-1421" doi: 10.5281/zenodo.8103783 message: If you use this software, please cite our article in the Journal of Open Source Software. preferred-citation: authors: - family-names: Nguyen given-names: Vincent orcid: "https://orcid.org/0000-0003-1787-8090" - family-names: Karimi given-names: Sarvnaz orcid: "https://orcid.org/0000-0002-4927-3937" - family-names: Xing given-names: Zhenchang orcid: "https://orcid.org/0000-0001-7663-1421" date-published: 2023-07-05 doi: 10.21105/joss.05017 issn: 2475-9066 issue: 87 journal: Journal of Open Source Software publisher: name: Open Journals start: 5017 title: "DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval" type: article url: "https://joss.theoj.org/papers/10.21105/joss.05017" volume: 8 title: "DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval" ```

If the repository is not hosted on GitHub, a .cff file can still be uploaded to set your preferred citation. Users will be able to manually copy and paste the citation.

Find more information on .cff files here and here.

editorialbot commented 1 year ago

🐘🐘🐘 πŸ‘‰ Toot for this paper πŸ‘ˆ 🐘🐘🐘

editorialbot commented 1 year ago

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited :point_right: https://github.com/openjournals/joss-papers/pull/4379
  2. Wait a couple of minutes, then verify that the paper DOI resolves https://doi.org/10.21105/joss.05017
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! πŸŽ‰πŸŒˆπŸ¦„πŸ’ƒπŸ‘»πŸ€˜

Any issues? Notify your editorial technical team...

arfon commented 1 year ago

@KonradHoeffner, @amitkumarj441 – many thanks for your reviews here! JOSS relies upon the volunteer effort of people like you and we simply wouldn't be able to do this without you ✨

@Ayuei – your paper is now accepted and published in JOSS :zap::rocket::boom:

editorialbot commented 1 year ago

:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.05017/status.svg)](https://doi.org/10.21105/joss.05017)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.05017">
  <img src="https://joss.theoj.org/papers/10.21105/joss.05017/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.05017/status.svg
   :target: https://doi.org/10.21105/joss.05017

This is how it will look in your documentation:

DOI

We need your help!

The Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following: