[REVIEW]: anndata: Access and store annotated data matrices

editorialbot commented 2 years ago

Submitting author: !--author-handle-->@falexwolf@luizirber<!--end-editor-- Reviewers: @nlhepler, @rcannood Archive: 10.5281/zenodo.13643180

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/8b1f66b60f7af653cc09ff32fee77d83"><img src="https://joss.theoj.org/papers/8b1f66b60f7af653cc09ff32fee77d83/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/8b1f66b60f7af653cc09ff32fee77d83/status.svg)](https://joss.theoj.org/papers/8b1f66b60f7af653cc09ff32fee77d83)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@nlhepler & @rcannood, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @luizirber know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Checklists

📝 Checklist for @nlhepler

📝 Checklist for @rcannood

editorialbot commented 2 years ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 2 years ago

Checking the BibTeX entries failed with the following error:

No paper file path

editorialbot commented 2 years ago

Software report:

github.com/AlDanial/cloc v 1.88  T=0.17 s (629.4 files/s, 102055.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          65           2795           2286          10220
reStructuredText                29            271            228            574
YAML                             5             19              2            129
TOML                             1              8              4            110
make                             1              5              6             14
Jupyter Notebook                 1              0            171             12
Markdown                         1              3              0              6
SVG                              1              0              0              1
-------------------------------------------------------------------------------
SUM:                           104           3101           2697          11066
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository

editorialbot commented 2 years ago

Failed to discover a Statement of need section in paper

editorialbot commented 2 years ago

:warning: An error happened when generating the pdf. Paper file not found.

nlhepler commented 2 years ago

Review checklist for @nlhepler

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/theislab/anndata?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@falexwolf) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

rcannood commented 2 years ago

Review checklist for @rcannood

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the https://github.com/theislab/anndata?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@falexwolf) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

luizirber commented 2 years ago

@editorialbot set paper as branch

editorialbot commented 2 years ago

Done! branch is now paper

luizirber commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

arfon commented 2 years ago

:wave: @nlhepler – how are you getting along with your review here?

falexwolf commented 2 years ago

@arfon We moved the repository from theislab/anndata to https://github.com/scverse/anndata. Could you update the metadata here?

nlhepler commented 2 years ago

@arfon apologies for the delay. I aim to have it done today, certainly no later than tomorrow.

falexwolf commented 2 years ago

Let me try to generate an up-to-date version that includes our response to @rcannood:

@editorialbot generate pdf

danielskatz commented 2 years ago

editorialbot commands need to be the first entry in a new comment.

falexwolf commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

nlhepler commented 2 years ago

@falexwolf, I see that @rcannood has opened scverse/anndata#769, would you prefer I piggy-back my comments there or open a separate issue?

falexwolf commented 2 years ago

Thanks for checking! It'd be great if you added your comments to the existing thread! It's more efficient if we address both of your comments in ideally one additional pass on the paper (or as many it takes until your happy)! 😅

arfon commented 2 years ago

:wave: folks. Just checking how things are going here? @nlhepler, @rcannood – it looks like you're well on your way to completing your reviews, have you had a chance to provide all of your feedback to @falexwolf?

nlhepler commented 2 years ago

:wave: Hi @arfon, the conversation is mostly happening in scverse/anndata#769 -- we're still going back and forth

falexwolf commented 2 years ago

Yes, all reviews are there! Isaac was on a several-week-long conference and holiday trip.

He just got back last week and we want to get back to reviews this Monday.

arfon commented 1 year ago

:wave: folks. Just checking in again here – @nlhepler, @rcannood – are your reviews mostly complete?

rcannood commented 1 year ago

I made my suggestions in https://github.com/scverse/anndata/issues/769, waiting for the authors to get back to us.

arfon commented 1 year ago

Thanks @rcannood.

@falexwolf – when do you think you might be able to respond to @rcannood's feedback?

falexwolf commented 1 year ago

@arfon, we responded to most of it.

@ivirshup, when do you think you can address the outstanding items?

Kevin-Mattheus-Moerman commented 1 year ago

@luizirber @falexwolf @ivirshup can you provide any updates on the above ☝️

falexwolf commented 1 year ago

My problem is that the remaining edits can only be performed by the first author. @ivirshup, could you provide an estimate?

arfon commented 1 year ago

Quick update, I emailed @ivirshup earlier today to see when they might be able to provide updates.

falexwolf commented 1 year ago

Did @ivirshup get back to you?

rcannood commented 1 year ago

Alternatively, we could omit statements regarding performance from the manuscript. WDYT?

falexwolf commented 1 year ago

I'd be OK with that! But I still hope this gets finalized in the form we foresaw after all the work that got put in by many people.

Kevin-Mattheus-Moerman commented 1 year ago

@falexwolf @arfon did you hear back from @ivirshup ? It would be good to move on here.

falexwolf commented 1 year ago

I couldn't get through. Just tried to ping @ivirshup through the repo now.

Kevin-Mattheus-Moerman commented 1 year ago

@falexwolf @ivirshup any updates to report? This review is taking very long due to inaction on the part of the authors. If this remains the case we will have no other option than to reject/retract this submission. Please can you respond in full to all reviewer comments and post a detailed update here at your earliest convenience. I will put a deadline of August 18th on this. If the review can continue and the authors are onboard to help make edits in a timely matter we can continue. If not we will proceed to reject.

falexwolf commented 1 year ago

It's very understandable that you'd retract the submission. I'm sorry for the effort we caused. At least, it seems the basic idea of having a paper for anndata wasn't wrong and the preprint is cited with 75 citations according to https://badge.dimensions.ai/details/id/pub.1144066001.

The blocker consists in a few technical points that only the first author (@ivirshup) can answer.

All other points have a response in the thread we have with the reviewers (https://github.com/scverse/anndata/issues/769). I'll happily move them here should @ivirshup make the response complete.

rcannood commented 1 year ago

Though I think having the joint collaboration of the authors to tackle the remaining issues in scverse/anndata#769 would be the best outcome, I don't think it's the only solution.

In an attempt to get this submission finally accepted, let me provide a recap of the only remaining blocking item in my review at scverse/anndata#769:

Recap of blocking item related to "Performance"

* [JOSS] **Performance**: If there are any performance claims of the software, have they been confirmed? * [@rcannood] There are various claims throughout the paper that anndata takes into account scRNAseq's dataset sizes in terms of cpu and memory usage. For example: - "Due to the increasing scale of data, we emphasized efficient operations with low memory and runtime overhead." - "In particular, AnnData takes great pains to support efficient operations with sparse data." * [@rcannood] I totally agree with the authors on this matter. However, I don't see any benchmarks in the paper or the documentation where anndata is compared to SCE, Seurat and/or loom in terms of memory usage, disk usage and/or execution time. * [@falexwolf] Thank you for pointing out that we didn’t substantiate performance claims in the paper! We’re glad to read that you agree with us on anndata’s good performance in comparison to many other data structures in the field. * [@falexwolf] Within the docs, we have this page that serves as a landing page for benchmarks: https://anndata.readthedocs.io/en/latest/benchmarks.html. To date, it has two elements: - It makes a comparison with the loom on-disk format here: https://anndata.readthedocs.io/en/latest/benchmark-read-write.html - It links to https://github.com/ivirshup/anndata-benchmarks, an entire devoted to benchmarking anndata over its entire lifetime using [airspeed velocity](https://asv.readthedocs.io/en/stable/). * [@falexwolf] We will add a more comprehensive comparative benchmarks in the docs, link it from the landing page and maintain it. * [@rcannood] The read/write benchmark is a good start! This only covers a very specific aspect of AnnData and only compares to loom. I'm not sure I'm wholly satisfied that this is sufficient to substantiate your claims regarding performance. The manuscript states: "Due to the increasing scale of data, we emphasized efficient operations with low memory and runtime overhead." - "Increasing scale of data" → the dataset used in this benchmark only contains 2300 cells - "efficient operations" → the same paragraph lists these operations: * out of core conversions between dense and sparse data * lazy subsetting * per-element operations for low total memory usage * in-place subsetting * combining `AnnData` objects with various merge strategies * lazy concatenation * batching * backed out-of-memory mode - "with low memory [...] overhead" → the benchmark currently doesn't cover memory overhead. - "low [...] runtime overhead" → the current benchmark compares against loom to demonstrate runtime efficiency. I wonder, how does it stack up against SCE when performing similar operations?

IMO, I think these are the following options for resolving the matter:

Option 1: Resolve the issue by publishing more extensive benchmarks to back up the claims made in the manuscript.
Option 2: If that is not possible, the performance claims made in the paper could be removed or watered down, as the paper is still valuable even without making any claims related to performance.
Option 3: If the performance claims in the manuscript are essential but extensive benchmarks will take too long to develop, perhaps the JOSS editors can make an editorial decision on whether such benchmarks are needed to back up the performance claims made in the manuscript.

Hopefully this helps in figuring out a solution to the remaining issues.

falexwolf commented 1 year ago

Thank you, @rcannood - your contributions are beyond anything that could be expected from any reviewer.

I'm happy to implement Option 2 if this is something reviewers and editors can accept. Option 3, I guess, equates to an acceptance of the current revision. Option 1 is what I can't do because it's highly technical and @ivirshup has been working on this for years (and is still working on it with the team, I believe).

rcannood commented 12 months ago

Hey @falexwolf ! Sorry for not responding earlier, it wasn't clear to me that you were posing a question.

Options 1 to 3 are all viable solutions for me :) :+1:

Kevin-Mattheus-Moerman commented 11 months ago

@luizirber can you check in with this review to ensure we keep up the pace, thanks.

rcannood commented 11 months ago

@Kevin-Mattheus-Moerman @luizirber Can I just check the [ ] Performance checkbox and quietly get this over with?

While I think the benchmarks themselves would be super interesting; at this stage I feel like I'm doing the scientific community a disservice for not allowing the AnnData paper to be published, since the performance of AnnData is not really the main point of the package or the accompanying manuscript.

Kevin-Mattheus-Moerman commented 10 months ago

@rcannood given what you describe that may be acceptable. I'll let @luizirber decide as he is the handling editor.

rcannood commented 9 months ago

Sounds good! :+1: Any updates on this?

Kevin-Mattheus-Moerman commented 9 months ago

@luizirber this thread needs your attention. 👋

rcannood commented 9 months ago

@Kevin-Mattheus-Moerman: @luizirber seems to be unresponsive. Do you have other ways of reaching out to @luizirber ?

luizirber commented 9 months ago

sorry for the unresponsiveness :disappointed:

@Kevin-Mattheus-Moerman @luizirber Can I just check the [ ] Performance checkbox and quietly get this over with?

While I think the benchmarks themselves would be super interesting; at this stage I feel like I'm doing the scientific community a disservice for not allowing the AnnData paper to be published, since the performance of AnnData is not really the main point of the package or the accompanying manuscript.

I agree. Not that performance is not important, but option 2 ("If that is not possible, the performance claims made in the paper could be removed or watered down, as the paper is still valuable even without making any claims related to performance.") seems the best to proceed because the main focus is the usability and the use cases enabled by the library.

rcannood commented 9 months ago

Alright, in that case I'll leave it in the hands of @falexwolf and colleagues to make changes to the paper.

@luizirber Could you run @editorialbot set https://github.com/scverse/anndata as repository ?

falexwolf commented 8 months ago

Will do it this week! :)

luizirber commented 8 months ago

@editorialbot set https://github.com/scverse/anndata as repository

openjournals / joss-reviews