Closed rcannood closed 10 months ago
Thank you very much for the review, @rcannood! We'll get back to you!
I'm sorry it takes us so long.
We need a bit more time to compile the benchmarks we have. We'll get back in a week.
Hi Alex et al.! How are things going? Let me know if anything is unclear or a misinterpretation from my end. I'd be happy to discuss! :)
Robrecht!
Apologies for the slow response!
Thank you very much for the in-depth review. It definitely helped us improve the paper!
General issues
Functionality → performance → benchmarks
Documentation
Software paper
State of field
Quality of writing: figure text in figure 1
Quality of writing: anndata is sometimes written as AnnData.
AnnData
denotes the class (data structure), whereas anndata denotes the software package (python module).Quality of writing: inline citation
State of the field:
State of the field: examples
All changes discussed in this response can be viewed together in this PR: https://github.com/scverse/anndata/pull/779. The up-to-date version of the paper is in the paper branch https://github.com/scverse/anndata/pull/666, as required by JOSS.
We'd be grateful for guidance on the question regarding Figure 1! We're also happy to further adapt the paper to your suggestions!
Thanks for your hard work @falexwolf and colleagues! I ticked off the check boxes for which I have no further questions or comments. I also updated the progress of my review checklist at https://github.com/openjournals/joss-reviews/issues/4371#issuecomment-1117684175 accordingly :)
My responses to your comments:
General issues → Repository
We fixed the repository link! https://github.com/scverse/anndata/commit/d409c48ec8ca26707f149fcd23cffbdf970d9f5b
The first issue at https://github.com/openjournals/joss-reviews/issues/4371 still lists github.com/theislab/anndata instead of scverse/anndata. I think it's important that this is fixed, because the url in the JOSS article itself also still points to the theislab organisation.
Functionality → Performance
It makes a comparison with the loom on-disk format here: https://anndata.readthedocs.io/en/latest/benchmark-read-write.html
The read/write benchmark is a good start! This only covers a very specific aspect of AnnData and only compares to loom. I'm not sure I'm wholly satisfied that this is sufficient to substantiate your claims regarding performance. The manuscript states: "Due to the increasing scale of data, we emphasized efficient operations with low memory and runtime overhead."
AnnData
objects with various merge strategiesFunctionality → Performance
It links to https://github.com/ivirshup/anndata-benchmarks, an entire devoted to benchmarking anndata over its entire lifetime using airspeed velocity.
It's very cool that you're able to track the performance of anndata in this way! However, I'm having a hard time browsing the results from this workflow. Are the results of the ASV benchmarks the one that are published in readthedocs?
Functionality → Performance
We will add a more comprehensive comparative benchmarks in the docs, link it from the landing page and maintain it.
Let me know when I can take a look at the extended benchmarks. I realise that the above-mentioned comments regarding the benchmarks are quite extensive. Maybe we can ask @luizirber to chime in on to what extent each and every performance-related claim in the paper needs to be backed by benchmarks?
Documentation → Installation
We agree with you that a more comprehensive installation page in the docs could be beneficial! We will add this page similar to what we set up for Scanpy.
Sounds good! Let me know when you'd like to me to take a look at that.
Quality of writing → figure text in figure 1
Agreed on this! We will remake the figure. Would you think the figure is acceptable if we increase the fontsize of “obs_names” and “var_names”? Or do you think we have to increase the fontsize everywhere?
Increasing the size of just the obs_names
, var_names
should be sufficient :)
Thanks a lot for this second round, @rcannood! (Why am I so slow in responding? Sorry!)
I pinged the editor re the repo location and am talking to @ivirshup about the rest!
Hey @rcannood, sorry also missed the update here!
"Increasing scale of data" → the dataset used in this benchmark only contains 2300 cells
Looking into making something more up to date here!
It's very cool that you're able to track the performance of anndata in this way! However, I'm having a hard time browsing the results from this workflow. Are the results of the ASV benchmarks the one that are published in readthedocs?
They are not.
At the moment, we just treat this as tooling for development, and don't publish results. We need to get dedicated hardware set up to have meaningful historical results (since performance can be quite variable between machines otherwise).
Additionally, the benchmarks here can be quite specific, and I'm not sure they would be informative by themselves. Here is an example of output: https://www.astropy.org/astropy-benchmarks/
Hello all!
Thank you for the very well-written paper and similarly well-built and well-maintained anndata package. I concur with @rcannood that it is worthy of publication, after a few things have been addressed:
uns
, which is also present in Figure 1's caption. It is possibly more important to describe than layers as you refer to uns
on line 132 of the main text.var
and obs
vs varm
and obsm
respectively? A single-dimensional annotation has an implicit dimension of 1 and can leverage the same datastructure as annotations with dimension >1? It may be worth describing the thought process behind this decision in the paper.A couple remarks I think warrant some interaction/debate:
I also have a few other comments that I would like to share that (I do not think block publication but) might be helpful if you would like to further polish this submission or if you intend to submit a follow-up paper in the future:
Thank you all again for the opportunity to review this paper and render remarks!
Best,
Lance Hepler
Thank you for this in-depth review! It provides very valuable feedback!
We'll thoroughly respond, but it's likely gonna take a little time.
Hi @falexwolf ! Any updates on this issue? (I feel Lance should have created a separate issues, so the topics can be tackled separately instead of being intermingled, but ok.) Many of my comments are relatively minor, so it'd be a pity of them to hold up the submission of AnnData to JOSS.
Hi @rcannood, @ivirshup and I made a list and we worked out many of them (e.g., benchmarking). We'll definitely get back.
I'm sorry that I on my end got much busier with @laminlabs in the past months, but we'll definitely not leave this unfinished.
I'll ping Isaac to have another meeting to finalize this.
@ivirshup & I are scheduled to meet today for the first time after 2 months. We split up the work. I'll add my responses here for now and very much hope we can finalize with another comment in this thread later today.
I'll address Lance's points in the same order and grouping as he provided them.
Regarding the text:
Debate:
Outlook:
Let us follow up with another post to address all outstanding remarks.
All changes discussed in this response can be viewed together in this PR: https://github.com/scverse/anndata/pull/825. They follow on the changes made in response to @rcannood (https://github.com/scverse/anndata/pull/779 / https://github.com/scverse/anndata/issues/769#issuecomment-1144614058).
We'll also address @rcannood's outstanding points beyond https://github.com/scverse/anndata/issues/769#issuecomment-1160600641.
The up-to-date version of the paper is in the paper branch https://github.com/scverse/anndata/pull/666, as required by JOSS.
Thank you for the responses! A remaining nitpick.
Text:
Debate:
@nlhepler Next time, please create a separate issue for your own comments. It's becoming hard to track what still needs to be done :)
To recap, I think my remaining issues are:
General issues
Functionality
Documentation
Let's make one more push towards getting AnnData published in JOSS. Let's go @falexwolf @ivirshup !
I'm fully onboard, @rcannood - I think the remaining points are what you point out + the font sizes of the figures. The repo location can't be fixed within the paper itself, there is no metadata section. So, I'll clarify that with the editors. Unfortunately, I can't do the other things independently but need @ivirshup for this.
@ivirshup, let's go! We've formally worked on this for more than 2 years (https://github.com/ivirshup/anndata-paper/graphs/contributors) and had prepared it for even longer. 😅 It'd be a shame if we didn't finish it after so much time.
@ivirshup - Trying to ping you here to get this back on your radar. 😅 Let me know if I can help in any way!
I met up with @ivirshup during the scverse hackathon. He mentioned that there are benchmarks on ±100'000 cells performed by a CI. I'm eager to see the results :)
Great that you caught @ivirshup, @rcannood! I'm eager to see the results, too!
Hi @ivirshup! Do you have any updates on the availability of the benchmarks on decently sized datasets?
Having passed the 1 year anniversary for the AnnData JOSS submission, I'd really like to have this submission wrapped up and dealt with. AnnData is a great resource to researchers all over the world, and I think it'd be a pity if this work doesn't get published at JOSS.
@falexwolf The paper mentions:
Furthermore, anndata is systematically benchmarked for performance using airspeed velocity [@Droettboom13], with the results linked from the docs.
@falexwolf Can you remind me where these results are published? The Benchmarks page only links to a simple benchmark and the ivirshup/anndata-benchmarks repository, but I don't see any actual results. Could you help me find them?
All benchmarking referenced here is by @ivirshup, I'm not in the loop on the details. @ivirshup, could you help out?
Downtuned performance claims as requested by the editor: https://github.com/scverse/anndata/pull/1267
See discussion in the JOSS review: https://github.com/openjournals/joss-reviews/issues/4371
This completes a sequence of 3 PRs to the paper branch
For reference, the paper branch is here:
Thanks @falexwolf! This resolves all of my outstanding comments :) Closing this issue.
Hi all!
I'm currently reviewing anndata's submission to JOSS (openjournals/joss-reviews#4371). I think the paper is very well written and is worthy of publication at JOSS. However, in going over the reviewer checklist provided by JOSS, there are some minor outstanding issue which I cannot check off at the moment.
Below is an overview of all my relatively minor comments. I'm happy to discuss them in this thread or in a separate issue, if need be.
Robrecht
General issues
Functionality
Documentation
Software paper
@Wickham2014
instead of[@Wickham2014]
.