[JOSS] Comments - Githubissues

rcannood commented 2 years ago

Hi all!

I'm currently reviewing anndata's submission to JOSS (openjournals/joss-reviews#4371). I think the paper is very well written and is worthy of publication at JOSS. However, in going over the reviewer checklist provided by JOSS, there are some minor outstanding issue which I cannot check off at the moment.

Below is an overview of all my relatively minor comments. I'm happy to discuss them in this thread or in a separate issue, if need be.

Robrecht

General issues

[x] Repository It seems the repository has been renamed from theislab/anndata to scverse/anndata. Please fix this in JOSS.

Functionality

[x] Performance "If there are any performance claims of the software, have they been confirmed?" There are various claims throughout the paper that anndata takes into account scRNAseq's dataset sizes in terms of cpu and memory usage. For example: "Due to the increasing scale of data, we emphasized efficient operations with low memory and runtime overhead." "In particular, AnnData takes great pains to support efficient operations with sparse data." I totally agree with the authors on this matter. However, I don't see any benchmarks in the paper or the documentation where anndata is compared to SCE, Seurat and/or loom in terms of memory usage, disk usage and/or execution time.

Documentation

[x] Installation instructions I don't see a "clearly-stated list of dependencies" in the documentation, as required by JOSS. I think the installation instructions could be expanded upon (e.g. https://scanpy.readthedocs.io/en/stable/installation.html).
[x] Example usage "Do the authors include examples of how to use the software (ideally to solve real-world analysis problems)." There are examples on concatenation and the on-disk format, but I don't see any usage examples on related to the core API of anndata.
[x] Community guidelines The contributing guidelines on readthedocs says "To be expanded."

Software paper

[x] State of the field The paper mentions "... where advances in single-cell RNA sequencing (scRNA-seq) gave rise to new classes of analysis problems with a stronger adoption of Python over the traditional R ecosystem." Please provide a citation for this claim. https://www.scrna-tools.org/analysis (Zappia et al. 2018) seems to show the opposite.
[x] Quality of writing The text in Figure 1 is too small to read when printed out on paper.
[x] Quality of writing anndata is sometimes written as AnnData. I assume a distinction is made between anndata the package and AnnData the object, but in the text AnnData sometimes seems to refer to the object as well. Please clarify.
[x] Quality of writing In the sentence "In the context of how (Wickham, 2014) recommends ...", I feel you should use an inline citation - that is @Wickham2014 instead of [@Wickham2014].
[x] State of the field "For analysis of scRNA-seq data in R this has been further simplified by anndata2ri, which allows conversion to SingleCellExperiment and Seurat’s data format." I think it's also worth mentioning the 'anndata for R' on CRAN, which provides a reticulate wrapper to the anndata Python package. I believe this package has a more intuitive interface modifying anndata files when only small modifications need to be made or for people with anndata but not with SCE or Seurat.
[x] State of the field If you want additional examples of anndata's application domain, or, if you think mention of the following is an improvement to the manuscript: anndata is heavily used in OpenProblems, a community driven platform for benchmarking tasks in single-cell analysis (openproblems.bio). For example, in a NeurIPS 2021 competition (Luecken et al 2021, https://openreview.net/forum?id=gN35BGa1Rt) use it as the common data format throughout the whole benchmarking pipeline.

falexwolf commented 2 years ago

Thank you very much for the review, @rcannood! We'll get back to you!

falexwolf commented 2 years ago

I'm sorry it takes us so long.

We need a bit more time to compile the benchmarks we have. We'll get back in a week.

rcannood commented 2 years ago

Hi Alex et al.! How are things going? Let me know if anything is unclear or a misinterpretation from my end. I'd be happy to discuss! :)

falexwolf commented 2 years ago

Robrecht!

Apologies for the slow response!

Thank you very much for the in-depth review. It definitely helped us improve the paper!

General issues

We fixed the repository link! https://github.com/scverse/anndata/pull/779/commits/d409c48ec8ca26707f149fcd23cffbdf970d9f5b

Functionality → performance → benchmarks

Thank you for pointing out that we didn’t substantiate performance claims in the paper! We’re glad to read that you agree with us on anndata’s good performance in comparison to many other data structures in the field.
Within the docs, we have this page that serves as a landing page for benchmarks: https://anndata.readthedocs.io/en/latest/benchmarks.html. To date, it has two elements:
- It makes a comparison with the loom on-disk format here: https://anndata.readthedocs.io/en/latest/benchmark-read-write.html
- It links to https://github.com/ivirshup/anndata-benchmarks, an entire devoted to benchmarking anndata over its entire lifetime using airspeed velocity.
We now mention these benchmarking efforts in the manuscript! https://github.com/scverse/anndata/pull/779/commits/c2d73094daeefb71293e425532d90b45106cf525
We will add a more comprehensive comparative benchmarks in the docs, link it from the landing page and maintain it.

Documentation

Installation
- We list the dependencies in pyproject.toml: https://github.com/scverse/anndata/blob/a3af650d40fd804cd866d55a674a7eeaa20fdec0/pyproject.toml#L43-L53
- We agree with you that a more comprehensive installation page in the docs could be beneficial! We will add this page similar to what we set up for Scanpy.
Example usage
- We recently added a tutorial that should address your point: https://anndata-tutorials.readthedocs.io/en/latest/getting-started.html
- We agree that before adding this tutorial, the lack of real-world use cases was a missing element of the anndata docs site.
Community guidelines
- We updated the page (https://anndata.readthedocs.io/en/latest/contributing.html) and explicitly stated that we follow the Scanpy guidelines. In the future, we will consolidate and make more references to scverse community guidelines rather than trying to maintain a standalone anndata reference. https://github.com/scverse/anndata/pull/778

Software paper

State of field
- We realize how the current statement about the adoption of Python was confusing to read! Thank you for pointing this out!
- We clarified it and explicitly referenced the following Figure 2a from Zappia21, which reviews the same database (scrna-tools.org) that you're citing! https://github.com/scverse/anndata/pull/779/commits/c0a2d97cc51fe9f3d04686eedf08e8dc02a0c89e
Quality of writing: figure text in figure 1
- Agreed on this! We will remake the figure. Would you think the figure is acceptable if we increase the fontsize of “obs_names” and “var_names”? Or do you think we have to increase the fontsize everywhere?
Quality of writing: anndata is sometimes written as AnnData.
- Indeed, AnnData denotes the class (data structure), whereas anndata denotes the software package (python module).
- We clarified this in the manuscript! https://github.com/scverse/anndata/pull/779/commits/ed59ed0e9a5e9740590618f85a83dc16311f924a
Quality of writing: inline citation
- Fixed! https://github.com/scverse/anndata/pull/779/commits/9b7bd8a172b4a7915595f5e8e663eaf7eebc8b95
State of the field:
- You're absolutely right! Apologies for the omission. We added a reference to the CRAN anndata package! https://github.com/scverse/anndata/pull/779/commits/67741a11e19c60b599040809d51948852e60a9d0
- We now also mention zellkonverter: https://github.com/scverse/anndata/pull/779/commits/93ce15ae73e6f9b4334f5ee5a916c7f2834060a3
State of the field: examples
- We added a reference to the benchmarking paper! https://github.com/scverse/anndata/pull/779/commits/b4684e3093505b9d13fd4bb09e93004f7f36efcb

All changes discussed in this response can be viewed together in this PR: https://github.com/scverse/anndata/pull/779. The up-to-date version of the paper is in the paper branch https://github.com/scverse/anndata/pull/666, as required by JOSS.

We'd be grateful for guidance on the question regarding Figure 1! We're also happy to further adapt the paper to your suggestions!

rcannood commented 2 years ago

Thanks for your hard work @falexwolf and colleagues! I ticked off the check boxes for which I have no further questions or comments. I also updated the progress of my review checklist at https://github.com/openjournals/joss-reviews/issues/4371#issuecomment-1117684175 accordingly :)

My responses to your comments:

General issues → Repository

We fixed the repository link! https://github.com/scverse/anndata/commit/d409c48ec8ca26707f149fcd23cffbdf970d9f5b

The first issue at https://github.com/openjournals/joss-reviews/issues/4371 still lists github.com/theislab/anndata instead of scverse/anndata. I think it's important that this is fixed, because the url in the JOSS article itself also still points to the theislab organisation.
Functionality → Performance

It makes a comparison with the loom on-disk format here: https://anndata.readthedocs.io/en/latest/benchmark-read-write.html

The read/write benchmark is a good start! This only covers a very specific aspect of AnnData and only compares to loom. I'm not sure I'm wholly satisfied that this is sufficient to substantiate your claims regarding performance. The manuscript states: "Due to the increasing scale of data, we emphasized efficient operations with low memory and runtime overhead."
- "Increasing scale of data" → the dataset used in this benchmark only contains 2300 cells
- "efficient operations" → the same paragraph lists these operations:
  - out of core conversions between dense and sparse data
  - lazy subsetting
  - per-element operations for low total memory usage
  - in-place subsetting
  - combining AnnData objects with various merge strategies
  - lazy concatenation
  - batching
  - backed out-of-memory mode
- "with low memory [...] overhead" → the benchmark currently doesn't cover memory overhead.
- "low [...] runtime overhead" → the current benchmark compares against loom to demonstrate runtime efficiency. I wonder, how does it stack up against SCE when performing similar operations?
Functionality → Performance

It links to https://github.com/ivirshup/anndata-benchmarks, an entire devoted to benchmarking anndata over its entire lifetime using airspeed velocity.

It's very cool that you're able to track the performance of anndata in this way! However, I'm having a hard time browsing the results from this workflow. Are the results of the ASV benchmarks the one that are published in readthedocs?
Functionality → Performance

We will add a more comprehensive comparative benchmarks in the docs, link it from the landing page and maintain it.

Let me know when I can take a look at the extended benchmarks. I realise that the above-mentioned comments regarding the benchmarks are quite extensive. Maybe we can ask @luizirber to chime in on to what extent each and every performance-related claim in the paper needs to be backed by benchmarks?
Documentation → Installation

We agree with you that a more comprehensive installation page in the docs could be beneficial! We will add this page similar to what we set up for Scanpy.

Sounds good! Let me know when you'd like to me to take a look at that.
Quality of writing → figure text in figure 1

Agreed on this! We will remake the figure. Would you think the figure is acceptable if we increase the fontsize of “obs_names” and “var_names”? Or do you think we have to increase the fontsize everywhere?

Increasing the size of just the obs_names, var_names should be sufficient :)

falexwolf commented 2 years ago

Thanks a lot for this second round, @rcannood! (Why am I so slow in responding? Sorry!)

I pinged the editor re the repo location and am talking to @ivirshup about the rest!

ivirshup commented 2 years ago

Hey @rcannood, sorry also missed the update here!

"Increasing scale of data" → the dataset used in this benchmark only contains 2300 cells

Looking into making something more up to date here!

It's very cool that you're able to track the performance of anndata in this way! However, I'm having a hard time browsing the results from this workflow. Are the results of the ASV benchmarks the one that are published in readthedocs?

They are not.

At the moment, we just treat this as tooling for development, and don't publish results. We need to get dedicated hardware set up to have meaningful historical results (since performance can be quite variable between machines otherwise).

Additionally, the benchmarks here can be quite specific, and I'm not sure they would be informative by themselves. Here is an example of output: https://www.astropy.org/astropy-benchmarks/

nlhepler commented 2 years ago

Hello all!

Thank you for the very well-written paper and similarly well-built and well-maintained anndata package. I concur with @rcannood that it is worthy of publication, after a few things have been addressed:

It's already been noted, but the package has since moved to scverse/anndata. I presume this will get fixed in time for publication, but it's also not necessarily blocking as an auto-redirect is in place.
I'm not sure if it's because the figures possibly appeared first in poster form, but the figure text is far too small for the 8.5x11" format. In particular Figure 1 and Figure 3 need to have the text resized to be at least 10pt (including/especially the TeX bits). Figure 2 could also use some massaging, but its text is not nearly as egregiously tiny.
The data structure section is missing a description/enumeration of the layers functionality of the format. This is described in the Figure 1 caption but absent from the main text.
The same section is also missing a description of uns, which is also present in Figure 1's caption. It is possibly more important to describe than layers as you refer to uns on line 132 of the main text.
I have a question: why are single- and multi-dimensional annotations separated into var and obs vs varm and obsm respectively? A single-dimensional annotation has an implicit dimension of 1 and can leverage the same datastructure as annotations with dimension >1? It may be worth describing the thought process behind this decision in the paper.
The data analysis workflow section, line 74: "kept track of~f~ by adding"
The on-disk format section, line 108: it would be clearer to say "HDF5 AnnData format" instead of .h5ad, or to parenthetically define .h5ad and .zarr on line 101.
Another question: line 108: you say the schema is versioned and stored in an internal registry. Is this registry internal to every file, as in a self-describing format, or internal to the anndata project? Either answer is acceptable, it may help to clarify in the paper.

A couple remarks I think warrant some interaction/debate:

I think it is maybe inaccurate to imply AnnData is a datastructure library (The AnnData object section). It seems to leverage existing datastructure libraries (for integration purposes, like numpy, scipy.sparse, and pandas), and provides a layer of APIs on top of these datastructures that render experimental data of this form easy to programmatically manipulate, integrate with downstream analyses, and serialize to and from disk.
Related to the above, Between the Introduction and the first sentence of The AnnData object section there's an indirect statement that AnnData is fulfilling an important niche: providing ExpressionSet-like functionality for Python language users, and integrating with the relevant packages in the Python ecosystem to provide access to e.g. modern machine learning and other (scientific) software unique to the Python ecosystem. It may be better to directly state this.

I also have a few other comments that I would like to share that (I do not think block publication but) might be helpful if you would like to further polish this submission or if you intend to submit a follow-up paper in the future:

AnnData is relatively mature, having developed originally as part of the scanpy project. As such, I think the Outlook section (or another) could contain a discussion of lessons learned and/or resolved, in-scope for future development efforts, or out-of-scope but something to think about in any rewrite/successor project. Comparisons with prior versions could also be a source of benchmarks.
Most papers describing new software would have benchmarks showcasing absolute performance claims. The paper as-is describes its performance qualitatively, rather than in quantitative terms, which is less useful/informative, but does indicate that the authors are aware how important time/space efficiency are to practical use of the project. Unfortunately, AnnData does not have any serious competitors in the Python ecosystem, and the R-based competitors are ... well it can often be a totally different set of trade-offs. So "advancing state of the art" here would be admittedly difficult to showcase. Instead it might be more appropriate to quantify the amount of boilerplate saved by using AnnData over rolling one's own data loaders/accessors, and to extol more assertively the benefits of standardized on-disk formats.
Related to the above, I would appreciate any future papers on AnnData to engage in self-comparative benchmarks to showcase the advantages of any new features/APIs/developments/etc.
As someone who has worked on similar software in a corporate setting for highly efficient storage and high-performance analysis of large datasets (100K+ cells) on commodity PC hardware, I know how important it is to provide low-level APIs that can be leveraged to write highly efficient applications. This is especially true in the context of lazy subsetting where you really do not want to fully instantiate a sub-matrix and possibly double your memory usage and do lots of unnecessary copying. I would really appreciate a detailed treatment of any such APIs in this or any future paper. In fact, I would argue that such work is what has made numpy such an important and successful project, as low-level high-performance APIs made it feasible to build many downstream packages (e.g. scipy, sklearn, etc) and integrations (e.g. Cython).

Thank you all again for the opportunity to review this paper and render remarks!

Best,

Lance Hepler

falexwolf commented 2 years ago

Thank you for this in-depth review! It provides very valuable feedback!

We'll thoroughly respond, but it's likely gonna take a little time.

rcannood commented 2 years ago

Hi @falexwolf ! Any updates on this issue? (I feel Lance should have created a separate issues, so the topics can be tackled separately instead of being intermingled, but ok.) Many of my comments are relatively minor, so it'd be a pity of them to hold up the submission of AnnData to JOSS.

falexwolf commented 2 years ago

Hi @rcannood, @ivirshup and I made a list and we worked out many of them (e.g., benchmarking). We'll definitely get back.

I'm sorry that I on my end got much busier with @laminlabs in the past months, but we'll definitely not leave this unfinished.

I'll ping Isaac to have another meeting to finalize this.

falexwolf commented 2 years ago

@ivirshup & I are scheduled to meet today for the first time after 2 months. We split up the work. I'll add my responses here for now and very much hope we can finalize with another comment in this thread later today.

I'll address Lance's points in the same order and grouping as he provided them.

Regarding the text:

"repo location": Fixed in all places where we could.
"figure text": @ivirshup, next comment.
"layers": https://github.com/scverse/anndata/pull/825/commits/15160220279e9d297824e24a2ebea8937521c437
"uns": https://github.com/scverse/anndata/pull/825/commits/e9a5b02ff5b69f13ef8007d9f6391f24206dc9e2
"Why are one- and multi-dimensional annotations separated in two fields?": Two reasons are a. Usage patterns: One-dimensional annotations are used differently than multi-dimensional annotations. They always serve as labels with some semantic content (even if numeric), while multi-dimensional annotations store representations of the data that need additional algorithms for interpretation (assigning semantic content). Having two slots for grouping things that are used for similar things seemed like a good UX (making a few canonical things easy, like, "benchmark these alternative latent representations", "combine these labels into one", "fit a model on these conditions"). b. One-dimensional annotations have a simple data type and a canonical storage format for grouping them (a DataFrame). But back in 2017, even this couldn't easily be stored persistently in an HDF5. So, when writing the on-disk format, I had to write a persistent serialization of a DataFrame and of multi-dimensional annotations. Both was feasible without too much work when done separately as each of these had simple data types, but I wouldn't have wanted to mix it (allowing for multi-dimensional annotations with mixed data types). @ivirshup implemented the latter years later.
Typo: https://github.com/scverse/anndata/pull/825/commits/24895a857a5335801146654115998a3a9825eca0
"h5ad": https://github.com/scverse/anndata/pull/825/commits/66962c58c43de10b09d1dbfc03c3a2aeee993482
"schema tracking": @ivirshup, next comment.

Debate:

"I think it is maybe inaccurate to imply AnnData is a data structure library (The AnnData object section).": @ivirshup & I discussed this remark in early August and thought we'd prefer to keep the term "data structure". While we agree that AnnData is based on existing data structures, it composes these to define a new data structure. We think this is a bit similar to how pandas stores data internally in numpy arrays, but composes them to give rise to a much richer structure, the DataFrame. Like pandas and other "data structure libraries", anndata offers APIs to conveniently interact with the new data structure.
"Clearer statement about filling a gap". We agree: https://github.com/scverse/anndata/pull/825/commits/4bbf466bbb14916a1714b3ac70465c690e9e2a39

Outlook:

"Lessons learned & benchmarks". @ivirshup will respond re benchmarking as he has spent much over the past years. I'm not sure whether there are very specific "lessons learned" for anndata. Clearly, today, a lot could be done differently. But back then, certain requirements made the adoption of say xarray hard to impossible. Isaac may have more things to say.
"Benchmarking": @ivirshup will address this.
"Self-comparative benchmarks": We agree!
"Low-level API": @ivirshup will address this.

Let us follow up with another post to address all outstanding remarks.

All changes discussed in this response can be viewed together in this PR: https://github.com/scverse/anndata/pull/825. They follow on the changes made in response to @rcannood (https://github.com/scverse/anndata/pull/779 / https://github.com/scverse/anndata/issues/769#issuecomment-1144614058).

We'll also address @rcannood's outstanding points beyond https://github.com/scverse/anndata/issues/769#issuecomment-1160600641.

The up-to-date version of the paper is in the paper branch https://github.com/scverse/anndata/pull/666, as required by JOSS.

nlhepler commented 2 years ago

Thank you for the responses! A remaining nitpick.

Text:

I presume I'm waiting on @ivirshup?
"which allows to store" is awkward phrasing --> "which allows one to store" or "which allows storing", etc.
Thank you for this explanation! It is probably too much detail to put into the main text, I wish there was a way to include supplemental information like this.

Debate:

I will concede. It is likely more sensible than any other description.

rcannood commented 2 years ago

@nlhepler Next time, please create a separate issue for your own comments. It's becoming hard to track what still needs to be done :)

rcannood commented 2 years ago

To recap, I think my remaining issues are:

General issues

[ ] Repository It seems the repository has been renamed from theislab/anndata to scverse/anndata. Please fix this in JOSS.

Functionality

[ ] Performance "If there are any performance claims of the software, have they been confirmed?" Would it be possible to include an atlas-sized dataset (e.g. >100'000 cells) in the benchmarks included on the website / somewhere?

Documentation

[ ] Installation instructions I don't see a "clearly-stated list of dependencies" in the documentation, as required by JOSS. I think the installation instructions could be expanded upon (e.g. https://scanpy.readthedocs.io/en/stable/installation.html).

Let's make one more push towards getting AnnData published in JOSS. Let's go @falexwolf @ivirshup !

falexwolf commented 2 years ago

I'm fully onboard, @rcannood - I think the remaining points are what you point out + the font sizes of the figures. The repo location can't be fixed within the paper itself, there is no metadata section. So, I'll clarify that with the editors. Unfortunately, I can't do the other things independently but need @ivirshup for this.

@ivirshup, let's go! We've formally worked on this for more than 2 years (https://github.com/ivirshup/anndata-paper/graphs/contributors) and had prepared it for even longer. 😅 It'd be a shame if we didn't finish it after so much time.

falexwolf commented 1 year ago

@ivirshup - Trying to ping you here to get this back on your radar. 😅 Let me know if I can help in any way!

rcannood commented 1 year ago

I met up with @ivirshup during the scverse hackathon. He mentioned that there are benchmarks on ±100'000 cells performed by a CI. I'm eager to see the results :)

falexwolf commented 1 year ago

Great that you caught @ivirshup, @rcannood! I'm eager to see the results, too!

rcannood commented 1 year ago

Hi @ivirshup! Do you have any updates on the availability of the benchmarks on decently sized datasets?

rcannood commented 1 year ago

Having passed the 1 year anniversary for the AnnData JOSS submission, I'd really like to have this submission wrapped up and dealt with. AnnData is a great resource to researchers all over the world, and I think it'd be a pity if this work doesn't get published at JOSS.

@falexwolf The paper mentions:

Furthermore, anndata is systematically benchmarked for performance using airspeed velocity [@Droettboom13], with the results linked from the docs.

@falexwolf Can you remind me where these results are published? The Benchmarks page only links to a simple benchmark and the ivirshup/anndata-benchmarks repository, but I don't see any actual results. Could you help me find them?

falexwolf commented 1 year ago

All benchmarking referenced here is by @ivirshup, I'm not in the loop on the details. @ivirshup, could you help out?

falexwolf commented 10 months ago

Downtuned performance claims as requested by the editor: https://github.com/scverse/anndata/pull/1267

See discussion in the JOSS review: https://github.com/openjournals/joss-reviews/issues/4371

falexwolf commented 10 months ago

This completes a sequence of 3 PRs to the paper branch

For reference, the paper branch is here:

https://github.com/scverse/anndata/pull/666

rcannood commented 10 months ago

Thanks @falexwolf! This resolves all of my outstanding comments :) Closing this issue.

scverse / anndata

[JOSS] Comments #769