[REVIEW]: DeBIR: A Python Package for Dense Bi-Encoder Information Retrieval

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@ayuei@arfon<!--end-editor-- Reviewers: @KonradHoeffner, @amitkumarj441 Archive: 10.5281/zenodo.8103783

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07"><img src="https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07/status.svg)](https://joss.theoj.org/papers/6454b671e9fc79029e5087f05b296e07)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@KonradHoeffner, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @KonradHoeffner

📝 Checklist for @amitkumarj441

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

KonradHoeffner commented 1 year ago

Furthermore, our pipeline runs asynchronously to reduce I/O performance bottlenecks, facilitating faster experiments and research

@Ayuei To check the "performance" box in the review I need some kind of proof for that statement, do you have a benchmark or some kind of measurement to confirm that it is faster than the competitors?

arfon commented 1 year ago

:wave: @Ayuei – @KonradHoeffner is waiting on your input here.

arfon commented 1 year ago

@KonradHoeffner @Ayuei – I'm going to need to find a second reviewer here. Do either of you have suggestions for who might be able to provide a second review here?

KonradHoeffner commented 1 year ago

Unfortunately I do not have a suggestion for a suitable reviewer replacement.

Ayuei commented 1 year ago

@KonradHoeffner @Ayuei – I'm going to need to find a second reviewer here. Do either of you have suggestions for who might be able to provide a second review here?

I don't have any suggestions. But a reviewer with an IR background and Python as their programming language is ideal.

arfon commented 1 year ago

:wave: @lemuria-wchen – would you be willing to review this submission for JOSS? The submission under consideration is DeBIR: A Python Package for Dense Bi-Encoder Information Retrieval, a Python package for information retrieval and natural language processing.

The review process at JOSS is unique: it takes place in a GitHub issue, is open, and author-reviewer-editor conversations are encouraged. You can learn more about the process in these guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

Based on your experience, we think you might be able to provide a great review of this submission. Please let me know if you think you can help us out!

Many thanks Arfon

Ayuei commented 1 year ago

@Ayuei To check the "performance" box in the review I need some kind of proof for that statement, do you have a benchmark or some kind of measurement to confirm that it is faster than the competitors?

I added a benchmark to justify the statement that our async pipeline is much faster than a synchronous one Link to benchmark. In summary, it's a 9-10x speedup.

Otherwise, all expensive embedding calls are immediately cached (so subsequent calls are I/O bound rather than GPU bound) which is what is tested in test_pipeline:test_query_encoder_cache in (test_pipeline.py)[https://github.com/Ayuei/DeBEIR/blob/main/tests/test_pipeline.py] and also benchmarked in Link to benchmark. We believe that this is valuable for researchers as experiments are often repeated several times.

KonradHoeffner commented 1 year ago

@arfon @Ayuei: I'm finished with my review now and checked all the boxes. However I did not know about the field of dense retrieval before, so the second reviewer should make sure to separately check especially whether all functional claims have been confirmed because I can only verify them to my limited understanding of the subject.

KonradHoeffner commented 1 year ago

Additional proof reading comments:

"transformer models could be used fine-tuned on Natural Language Inference tasks": there seems to be something missing, for example "that are fine-tuned"
"Dense retrieval has been popular in Information Retrieval since 2015" vs "in 2019 ... sparking the field of dense retrieval": Did it start in 2015 or 2019? Please clarify.
"stagnation repeated" How can a stagnation repeat itself? Does that mean there was a period of rapid progress in between? That is not mentioned.
"biomedical IR, where training data is scarce" please explain why training data is scarce there
Please add minimal and optimal hardware requirements to the paper, especially regarding GPU support.

arfon commented 1 year ago

🙌 thanks for much for the detailed review @KonradHoeffner.

arfon commented 1 year ago

@editorialbot remove @kuutsav as reviewer

editorialbot commented 1 year ago

@kuutsav removed from the reviewers list!

Ayuei commented 1 year ago

Hi @arfon, is there anything I can do to speed the process?

I had a look again at the reviewers list, and it seems that the initial list of candidates I proposed during the pre-review are still relevant for the submission.

arfon commented 1 year ago

I had a look again at the reviewers list, and it seems that the initial list of candidates I proposed during the pre-review are still relevant for the submission.

All of the reviewers you suggested have been invited. I can try pinging them again but if you're able to suggest people you know in your research field that's probably better (provided they're not conflicted).

amitkumarj441 commented 1 year ago

I had a look again at the reviewers list, and it seems that the initial list of candidates I proposed during the pre-review are still relevant for the submission.

All of the reviewers you suggested have been invited. I can try pinging them again but if you're able to suggest people you know in your research field that's probably better (provided they're not conflicted).

Hi @arfon, I am available to review this IR paper/codebase and hoping to get it done by the end of this week. Let me know if you're still open to get this reviewed. Thanks!

Ayuei commented 1 year ago

@arfon Pinging just in case you didn't see @amitkumarj441's message.

arfon commented 1 year ago

Hi @arfon, I am available to review this IR paper/codebase and hoping to get it done by the end of this week. Let me know if you're still open to get this reviewed. Thanks!

Amazing, thank you so much @amitkumarj441 (@Ayuei I did miss this sorry 🤦 ). @amitkumarj441 – I'll go ahead and add you as a reviewer now 👍

arfon commented 1 year ago

@editorialbot add @amitkumarj441 as reviewer

editorialbot commented 1 year ago

@amitkumarj441 added to the reviewers list!

arfon commented 1 year ago

@amitkumarj441 – This is the review thread for the paper. All of our communications will happen here from now on.

Please read the "Reviewer instructions & questions" in the first comment above. Please create your checklist typing:

@editorialbot generate my checklist

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention https://github.com/openjournals/joss-reviews/issues/5017 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for the review process to be completed within about 4-6 weeks but please make a start well ahead of this as JOSS reviews are by their nature iterative and any early feedback you may be able to provide to the author will be very helpful in meeting this schedule.

amitkumarj441 commented 1 year ago

Review checklist for @amitkumarj441

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/Ayuei/DeBEIR?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@ayuei) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines
[x] Data sharing: If the paper contains original data, data are accessible to the reviewers. If the paper contains no original data, please check this item.
[x] Reproducibility: If the paper contains original results, results are entirely reproducible by reviewers. If the paper contains no original results, please check this item.
[x] Human and animal research: If the paper contains original data research on humans subjects or animals, does it comply with JOSS's human participants research policy and/or animal research policy? If the paper contains no such data, please check this item.

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of need' that clearly states what problems the software is designed to solve, who the target audience is, and its relation to other work?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

arfon commented 1 year ago

:wave: @amitkumarj441 – just checking in to see how things are going with your review?

amitkumarj441 commented 1 year ago

I am back to reviewing the codebase and was able to run the Python tests without a hitch, given this library has support enabled for both ES and Solr which works in a standalone manner. However, the automated tests pipeline via Docker runs on MacOs which is not platform-agnostic, though, you may add a bit about instructions on how to run this dockerized image for indexing on Linux. The main pros of this library for reproducing a similar case in IR is the support of an extended list of metrics (NDCG@k, p@k, r@k, etc.). Apart from this, the documentation should clarify on the timeout for ES such as L16.

Ayuei commented 1 year ago

I am back to reviewing the codebase and was able to run the Python tests without a hitch, given this library has support enabled for both ES and Solr which works in a standalone manner. However, the automated tests pipeline via Docker runs on MacOs which is not platform-agnostic, though, you may add a bit about instructions on how to run this dockerized image for indexing on Linux. The main pros of this library for reproducing a similar case in IR is the support of an extended list of metrics (NDCG@k, p@k, r@k, etc.). Apart from this, the documentation should clarify on the timeout for ES such as L16.

Thank you for your review @amitkumarj441,

I added more documentation about the ES config in commit b25dd9

I think the testing should be platform agnostic on the main branch (https://github.com/Ayuei/DeBEIR/blob/main/tests/build_test_env.sh), which has seen major refactoring since the paper submission.

amitkumarj441 commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1145/3336191.3371864 is OK

MISSING DOIs

- 10.1145/3065386 may be a valid DOI for title: ImageNet Classification with Deep Convolutional Neural Networks
- 10.1162/coli_r_00468 may be a valid DOI for title: Pretrained transformers for text ranking: Bert and beyond

INVALID DOIs

- https://doi.org/10.1016/j.jbi.2022.104005 is INVALID because of 'https://doi.org/' prefix

amitkumarj441 commented 1 year ago

@Ayuei That's helpful to run tests, however, under the main branch in the central instructions file (README.md), you should fix the typo of the shell script file as below

cd tests/
./build_test.env.sh -> ./build_test_env.sh
pytest .

Also, fix the missing and erroneous references as listed above. I am happy to recommend the paper for publication at this stage provided that the above suggestions are taken into consideration.

Ayuei commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3065386 is OK
- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.1145/3331184.3331340 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1016/j.jbi.2022.104005 is OK
- 10.1145/3336191.3371864 is OK
- 10.1162/coli_r_00468 is OK

MISSING DOIs

- None

INVALID DOIs

- None

Ayuei commented 1 year ago

@Ayuei That's helpful to run tests, however, under the main branch in the central instructions file (README.md), you should fix the typo of the shell script file as below
cd tests/
./build_test.env.sh -> ./build_test_env.sh
pytest .
Also, fix the missing and erroneous references as listed above. I am happy to recommend the paper for publication at this stage provided that the above suggestions are taken into consideration.

@amitkumarj441 Thanks for catching this!

I just fixed this in the main and paper branches. I also fixed the missing and erroneous DOIs (Commit https://github.com/Ayuei/DeBEIR/commit/8a9714aed769431aefdd12d1d4eb929cc9682318).

amitkumarj441 commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3065386 is OK
- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.1145/3331184.3331340 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1016/j.jbi.2022.104005 is OK
- 10.1145/3336191.3371864 is OK
- 10.1162/coli_r_00468 is OK

MISSING DOIs

- None

INVALID DOIs

- None

amitkumarj441 commented 1 year ago

@arfon I think it's much improved, and I'm happy to sign-off the full review.

arfon commented 1 year ago

@Ayuei – At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

The title of the archive is the same as the JOSS paper title
That the authors of the archive are the same as the JOSS paper authors
Finally, please make sure you give the paper a final proof read yourself before I do.

I can then move forward with accepting the submission.

Ayuei commented 1 year ago

@Ayuei – At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

The title of the archive is the same as the JOSS paper title

That the authors of the archive are the same as the JOSS paper authors

Finally, please make sure you give the paper a final proof read yourself before I do.

I can then move forward with accepting the submission.

Hi @arfon,

Thank you for the detailed instructions

I published the new release here with all the changes from the reviews: Latest Release
I made the archive here with Zenodo here: 10.5281/zenodo.8103783
Finally, I also did proof read and uploaded the changes the paper branch: paper branch

If there is anything else I need to do, please let me know.

arfon commented 1 year ago

@editorialbot set 10.5281/zenodo.8103783 as archive

editorialbot commented 1 year ago

Done! archive is now 10.5281/zenodo.8103783

arfon commented 1 year ago

@editorialbot recommend-accept

editorialbot commented 1 year ago

Attempting dry run of processing paper acceptance...

editorialbot commented 1 year ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1145/3065386 is OK
- 10.1109/CVPR.2015.7298594 is OK
- 10.18653/v1/N19-1423 is OK
- 10.18653/v1/D19-1387 is OK
- 10.1093/bioinformatics/btz682 is OK
- 10.1145/3331184.3331340 is OK
- 10.18653/v1/D19-1410 is OK
- 10.1016/j.jbi.2022.104005 is OK
- 10.1145/3336191.3371864 is OK
- 10.1162/coli_r_00468 is OK

MISSING DOIs

- None

INVALID DOIs

- None

editorialbot commented 1 year ago

:wave: @openjournals/dsais-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/4374, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

arfon commented 1 year ago

@editorialbot accept

editorialbot commented 1 year ago

Doing it live! Attempting automated processing of paper acceptance...

editorialbot commented 1 year ago

Ensure proper citation by uploading a plain text CITATION.cff file to the default branch of your repository.

If using GitHub, a Cite this repository menu will appear in the About section, containing both APA and BibTeX formats. When exported to Zotero using a browser plugin, Zotero will automatically create an entry using the information contained in the .cff file.

You can copy the contents for your CITATION.cff file here:

CITATION.cff

``` cff-version: "1.2.0" authors: - family-names: Nguyen given-names: Vincent orcid: "https://orcid.org/0000-0003-1787-8090" - family-names: Karimi given-names: Sarvnaz orcid: "https://orcid.org/0000-0002-4927-3937" - family-names: Xing given-names: Zhenchang orcid: "https://orcid.org/0000-0001-7663-1421" doi: 10.5281/zenodo.8103783 message: If you use this software, please cite our article in the Journal of Open Source Software. preferred-citation: authors: - family-names: Nguyen given-names: Vincent orcid: "https://orcid.org/0000-0003-1787-8090" - family-names: Karimi given-names: Sarvnaz orcid: "https://orcid.org/0000-0002-4927-3937" - family-names: Xing given-names: Zhenchang orcid: "https://orcid.org/0000-0001-7663-1421" date-published: 2023-07-05 doi: 10.21105/joss.05017 issn: 2475-9066 issue: 87 journal: Journal of Open Source Software publisher: name: Open Journals start: 5017 title: "DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval" type: article url: "https://joss.theoj.org/papers/10.21105/joss.05017" volume: 8 title: "DeBEIR: A Python Package for Dense Bi-Encoder Information Retrieval" ```

If the repository is not hosted on GitHub, a .cff file can still be uploaded to set your preferred citation. Users will be able to manually copy and paste the citation.

Find more information on .cff files here and here.

editorialbot commented 1 year ago

🐘🐘🐘 👉 Toot for this paper 👈 🐘🐘🐘

editorialbot commented 1 year ago

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

Check final PDF and Crossref metadata that was deposited :point_right: https://github.com/openjournals/joss-papers/pull/4379
Wait a couple of minutes, then verify that the paper DOI resolves https://doi.org/10.21105/joss.05017
If everything looks good, then close this review issue.
Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...

arfon commented 1 year ago

@KonradHoeffner, @amitkumarj441 – many thanks for your reviews here! JOSS relies upon the volunteer effort of people like you and we simply wouldn't be able to do this without you ✨

@Ayuei – your paper is now accepted and published in JOSS :zap::rocket::boom:

editorialbot commented 1 year ago

:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](https://joss.theoj.org/papers/10.21105/joss.05017/status.svg)](https://doi.org/10.21105/joss.05017)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.05017">
  <img src="https://joss.theoj.org/papers/10.21105/joss.05017/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: https://joss.theoj.org/papers/10.21105/joss.05017/status.svg
   :target: https://doi.org/10.21105/joss.05017

This is how it will look in your documentation:

We need your help!

The Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Volunteering to review for us sometime in the future. You can add your name to the reviewer list here: https://reviewers.joss.theoj.org/join
Making a small donation to support our running costs here: https://numfocus.org/donate-to-joss

openjournals / joss-reviews