openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: cellanneal: A user-friendly deconvolution software for omics data #5610

Closed editorialbot closed 10 months ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@libuchauer<!--end-author-handle-- (Lisa Buchauer) Repository: https://github.com/LiBuchauer/cellanneal Branch with paper.md (empty if default branch): Version: v1.1.0 Editor: !--editor-->@jmschrei<!--end-editor-- Reviewers: @ritika-giri, @ManavalanG Archive: 10.5281/zenodo.10405043

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/a54beec3c28bc06c976a471bb049748a"><img src="https://joss.theoj.org/papers/a54beec3c28bc06c976a471bb049748a/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/a54beec3c28bc06c976a471bb049748a/status.svg)](https://joss.theoj.org/papers/a54beec3c28bc06c976a471bb049748a)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@ritika-giri & @ManavalanG, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jmschrei know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @ritika-giri

📝 Checklist for @ManavalanG

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.04 s (355.8 files/s, 85784.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           8            278            598           1591
Markdown                         4            167              0            368
TeX                              1             20              0            236
Jupyter Notebook                 1              0            315             21
YAML                             1              1              4             18
-------------------------------------------------------------------------------
SUM:                            15            466            917           2234
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 1296

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/bioinformatics/btz363 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.5281/zenodo.3509134 is OK
- 10.21105/joss.03021 is OK
- 10.1109/MCSE.2007.55 is OK

MISSING DOIs

- 10.1038/s41467-020-19015-1 may be a valid DOI for title: Benchmarking of cell type deconvolution pipelines for transcriptomics data
- 10.1101/354944 may be a valid DOI for title: Bulk tissue cell type deconvolution with multi-subject single-cell expression reference
- 10.1101/2020.10.01.322867 may be a valid DOI for title: Likelihood-based deconvolution of bulk gene expression data using single-cell references
- 10.1371/journal.pcbi.1006976 may be a valid DOI for title: Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares
- 10.1038/nmeth.3337 may be a valid DOI for title: Robust enumeration of cell subsets from tissue expression profiles
- 10.1038/s41587-019-0114-2 may be a valid DOI for title: Determining cell type abundance and expression from bulk tissues with digital cytometry
- 10.1101/2020.02.21.940650 may be a valid DOI for title: AutoGeneS: Automatic gene selection using multi-objective optimization for RNA-seq deconvolution
- 10.1101/2022.11.11.516138 may be a valid DOI for title: Terminal differentiation of villus-tip enterocytes is governed by distinct members of Tgfβsuperfamily

INVALID DOIs

- None
editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jmschrei commented 1 year ago

Howdy @ritika-giri and @ManavalanG

Thanks for agreeing to review this submission.

The process for conducting a review is outlined above. Please run the command shown above to have @editorialbot generate your checklist, which will give a step-by-step process for conducting your review. Please check the boxes during your review to keep track, as well as make comments in this thread or open issues in the repository itself to point out issues you encounter. Keep in mind that our aim is to improve the submission to the point where it is of high enough quality to be accepted, rather than to provide a yes/no decision, and so having a conversation with the authors is encouraged rather than providing a single review post at the end of the process.

Here are the review guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html And here is a checklist, similar to above: https://joss.readthedocs.io/en/latest/review_checklist.html

Please let me know if you encounter any issues or need any help during the review process, and thanks for contributing your time to JOSS and the open-source community!

jmschrei commented 1 year ago

@LiBuchauer would you mind looking at those missing DOIs?

ritika-giri commented 1 year ago

Review checklist for @ritika-giri

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

LiBuchauer commented 1 year ago

Hi, I added DOIs wherever possible. Thanks for agreeing to review @ritika-giri and @ManavalanG!

LiBuchauer commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ManavalanG commented 1 year ago

Review checklist for @ManavalanG

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

jmschrei commented 1 year ago

@ritika-giri and @ManavalanG, can you please provide updates as to how your reviews are going?

ManavalanG commented 1 year ago

@jmschrei I will get it completed this week :)

ritika-giri commented 1 year ago

@jmschrei thanks for checking in - I will be done by Aug 20!

ManavalanG commented 1 year ago

Apologies for the delay! I am working on the review and I will submit my review in the next few days.

ManavalanG commented 1 year ago

@LiBuchauer In the interest of time, I have included my comments based on the tool installation and testing so far. I will follow up in the next few days with my further comments on the manuscript.

Installation

Documentation

Functionality

LiBuchauer commented 1 year ago

Thanks @ManavalanG, excellent points. I have started working on it as documented below.

Installation

  • [x] Installation requires several dependencies such as numpy, etc., but their versions are not pinned. I would highly recommend pinning them (eg. numpy>=1.24 to eliminate version related errors. --> addressed in 3b8081e

  • [x] Minor suggestion. It would be a good idea to specify the dependencies in requirements.txt file for pip-based installation and using environment.yml file for conda-based installation. These files would also make it easier to specify the dependency versions. --> addressed in 3b8081e

Documentation

  • [x] Minor suggestion.pip install . needs to be executed irrespective of whether dependencies were installed via conda or pip. However It was unclear to me this step was needed when I chose the conda route. A minor reorganization would help. --> addressed in 285b24c
  • [x] A snippet in README.md - "The repository contains example data from a publication on liver cancer microenvironments at examples/example_data/". Please cite the publication here. It is cited in file examples/cellanneal_quickstart.ipynb, but mentioning it in the readme doc would greatly help the users. --> added in 857bfa5
  • [x] Fulfill this requirement from JOSS's checklist on documentation: "Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support". This is already accomplished by the included file CONTRIBUTING.md but mose users would likely not know about this file. So, linking it in the readme doc would make it easier for folks to come across the contributing and support guidelines. --> addressed in a944bc8

Functionality

  • [x] Regarding performance, manuscript states that "Its typical processing time for one mixture sample is below one minute on a desktop machine". However, in my testing the tool on Macbook Pro 2019 (2.4GHz 8-core processor, 32GB mem), it took ~2mins. Command I used: time cellanneal examples/example_data/mixture_data_liver_tumor.csv examples/example_data/signature_data_human_liver.csv output_directory. Time output on my Mac: Time output: "674.31s user 179.48s system 738% cpu 1:55.67 total". I ran the command for a total of 3 times, and it took ~2mins every time. Please complete the following:

    • [x] Mention the specs of the desktop machine where testing was performed --> addressed in 9275cca
    • [x] Confirm the time taken to run the tool. --> I take the liberty to check this, because the text file contains 5 columns, i.e. 5 mixture samples, meaning that the statement with less than 1 minute per mixture sample holds on your machine
ManavalanG commented 1 year ago

Here are my further comments, following my earlier feedback/comments.

Manuscript

Tool installation


Note: In my initial feedback provided on Aug 21, I had tagged my comment about tool dependency version and conda environment definition as minor suggestion but upon further reflection, I removed that tag. Having tool versions defined would help with reliable installation and reproducibility. Please let me know if you have any questions :)

jmschrei commented 1 year ago

@ritika-giri how is your review coming?

@LiBuchauer have you had a chance to look at the comments raised by @ManavalanG?

jmschrei commented 1 year ago

@ritika-giri can you please provide an update? @ManavalanG how is your review coming?

ManavalanG commented 1 year ago

@jmschrei My initial review is complete. I will resume once I hear back from the authors.

jmschrei commented 1 year ago

Thanks for the update. @LiBuchauer have you had a chance to look at the comments?

Kevin-Mattheus-Moerman commented 1 year ago

@LiBuchauer are you able to work on the above :point_up:, to avoid general delays, and to avoid loosing track of the reviewers, we recommend that you respond to reviewer comments/issues in a timely manor.

LiBuchauer commented 1 year ago

Hi all, very sorry for the delay, I am working on the last three points now. Will update.

LiBuchauer commented 1 year ago

Hi @ManavalanG , I addressed the remaining 3 points about the manuscript as detailed below, thank you for your input. Please let me know if anything else is lacking.

Manuscript

  • [X] Title is really broad considering content of the manuscript. It discusses application in the field of transcriptomics, whereas title says software for omics data. Justification on how it applies to other type of omics data need to be included, or title needs to be modified to reflect the manuscript's content (ie. transciptomics data). --> I changed it as requested.
  • [X] The end of second paragraph in the summary section mentions two major optimization algorithms (least squares regression and support vector regression) used to solve the problem, and then the third paragraph discusses the challenges of using least squares regression. However such description on the topic of support vector regression is missing, and adding a note on them would guide the reader. Also, mentioning how the algorithm used in cellanneal (spearman’s rank correlation coefficient) fits in between the above two algorithms would be helpful. --> I added one sentence about SVR method problems, and also outlined what cellanneal does differently.
  • [X] Minor. Mentioning dataset used to obtain the figures used in manuscript could be helpful. --> Done

Tool installation

  • [ ] Documentation is missing info on how to install a particular version of cellanneal. For example, how to install v1.0.0? [Update Oct 2] This is minor and recommended but does not need to be included. --> I choose not to do this in the interest of time and also because there are currently no several versions :) hope it's okay

Note: In my initial feedback provided on Aug 21, I had tagged my comment about tool dependency version and conda environment definition as minor suggestion but upon further reflection, I removed that tag. Having tool versions defined would help with reliable installation and reproducibility. Please let me know if you have any questions :) --> I had already adressed this

LiBuchauer commented 1 year ago

@jmschrei @Kevin-Mattheus-Moerman sorry again for the delay, I believe I have addressed all the points @ManavalanG has raised. Will wait for his feedback. Thanks everyone for your time!!

jmschrei commented 1 year ago

Thanks @LiBuchauer. @ManavalanG let me know what you think.

ManavalanG commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

ManavalanG commented 1 year ago

@LiBuchauer Thanks for making the requested changes.

@jmschrei I have now completed the review. Please let me know if there are any questions :)

jmschrei commented 1 year ago

@ritika-giri do you have any other concerns about the paper? If not, please check off the remaining items in the list.

jmschrei commented 1 year ago

@ritika-giri checking in on this again

ritika-giri commented 1 year ago

Sorry for the delay @jmschrei - I have 2 major comments:

  1. Authors do not mention other publications, or packages in R / Python that implement simulated annealing algorithms for cell mixture deconvolution. These should be clearly referenced in order to establish the broader context of the field and how their work fits into it.
  2. I suspect that this package does not fulfill the criteria of substantial scholarly effort, although I am more than happy to be corrected by the authors or editor. As I understand, the file dual_annealing.py and general.py contain the algorithms. Of these, dual_annealing.py is from scipy's codebase with no modifications. The file general.py has two major functions - to find highly variable genes and deconvolution functions. Of these, the code for finding variable genes is from scanpy's implementation of the same, while deconvolve functions are calling the dual annealing code (originally in scipy). I would like the authors to clearly state the algorithmic modifications that are novel in their package, or their justifications for substantial scholarly effort.
jmschrei commented 1 year ago

Thanks @ritika-giri. @LiBuchauer can you respond to these concerns when you get a chance?

LiBuchauer commented 1 year ago

Hi good morning @ritika-giri,

regarding the first point, to my best knowledge there are no other packages that use simulated annealing as an optimisation procedure for optimising Spearman's R between experimental and computational method and I consider this an original idea we had. Obviously I will cite and discuss any such paper that you can name. A lot of the published methods rely on parametric distance metrics because of availability of fast optimization algorithms for them. In many omics contexts however, non-parametric methods have proven to be more stable, and thus, in cellanneal, we implemented a solution with Spearman’s rank correlation coefficient as a distance function and simulated annealing as an optimization procedure.

This also brings us to the second point. dual_annealing.py, as you note, is from scipy, which is clearly cited and marked. The reason that I copied the code over is that an important part of cellanneal is the GUI which comes as a single executable, and in order to keep this slim and reduce its start-up time, I wanted to remove scipy from the dependancies. Around this simulated annealing implementation, we provide a pipeline for importing bulk and signature data, extracting highly variable genes (yes, via the scanpy implementation, as is clearly marked), running the optimization and returning relevant plots as well as tabular results. The functionality is accessible via python, cli or GUI. We find our method to perform well and fast.

Overall, in my view, the most important points are 1) we present a new idea for performing cell type deconvolution based on a non-parametric distance function and 2) (maybe more relevant for JOSS), we provide this as an easy-to-use software which can also be employed by non-coding scientists and runs locally. So far, cellanneal has been cited by 3 peer reviewed publications (https://scholar.google.com/scholar?cites=10240263254458724390&as_sdt=2005&sciodt=0,5&hl=en), including two wholly unrelated to the cellanneal authors, and I know from people seeking support via email that is has also found its way into some biotech companies. It’s probably not worth anything, but the work that went into this is certainly (much) more than 3 months full time, though of course not all was spent on the code, but on conceiving and testing the method together with experimental biologists.

@ritika-giri hope this clarifies it a bit & @jmschrei hope you can make a call based on this explanation.

jmschrei commented 1 year ago

In general, I think it's better to import functions from other packages (even big ones like scipy) so that (1) they get the credit they deserve for writing the original algorithm and (2) any upstream improvements in performance can make their way into your package without any effort on your part. That being said, it's not a strict requirement and the authors do seem to clearly state where the code is from.

I personally believe that the substantial scholarly effort criteria have been met. Remember that substantial effort at JOSS focuses more on the development of the code rather than algorithmic novelty -- though it does need to fill a niche. @ritika-giri if you know of any specific publications using simulated annealing you think they should mention, I agree that they should include them, even if they aren't explicitly optimizing Spearman R.

jmschrei commented 1 year ago

@ritika-giri what do you think of the above responses?

ritika-giri commented 1 year ago

Thank you for the clarifications and thoughtful inputs @jmschrei and @LiBuchauer. Happy to sign off on the review. I will provide some references for cell deconvolution using SA in a couple days.

jmschrei commented 1 year ago

Thank you @ritika-giri. I understand that you're busy, but keep in mind that this paper has been under review since June 30th. Any speed on your end would be greatly appreciated.

Kevin-Mattheus-Moerman commented 12 months ago

@ritika-giri :wave: please can you get back to @jmschrei ?

jmschrei commented 11 months ago

Okay, I'm going to go ahead and move forward with this without @ritika-giri's comments.

jmschrei commented 11 months ago

Post-Review Checklist for Editor and Authors

Editor Tasks Prior to Acceptance

jmschrei commented 11 months ago

@editorialbot generate pdf

jmschrei commented 11 months ago

@editorialbot check references

editorialbot commented 11 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1038/s41467-020-19015-1 is OK
- 10.1093/bioinformatics/btz363 is OK
- 10.7554/eLife.26476 is OK
- 10.1038/s41467-018-08023-x is OK
- 10.1101/gr.272344.120 is OK
- 10.1038/s41467-020-15816-6 is OK
- 10.1371/journal.pcbi.1006976 is OK
- 10.1038/nmeth.3337 is OK
- 10.1038/s41587-019-0114-2 is OK
- 10.1016/j.cels.2021.05.006 is OK
- 10.1186/s13059-016-1028-7 is OK
- 10.1126/science.220.4598.671 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.5281/zenodo.3509134 is OK
- 10.21105/joss.03021 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1371/journal.pbio.3002124 is OK
- 10.1101/2022.11.11.516138 is OK

MISSING DOIs

- None

INVALID DOIs

- Globalquantificationofmammaliangeneexpressioncontrol is INVALID
jmschrei commented 11 months ago

@LiBuchauer can you provide the DOI for the paper (e.g., from Zenodo) and the version of the code associated with this submission? And can you check out that invalid DOI?

editorialbot commented 11 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

LiBuchauer commented 11 months ago

@editorialbot check references

editorialbot commented 11 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1038/s41467-020-19015-1 is OK
- 10.1093/bioinformatics/btz363 is OK
- 10.7554/eLife.26476 is OK
- 10.1038/s41467-018-08023-x is OK
- 10.1101/gr.272344.120 is OK
- 10.1038/s41467-020-15816-6 is OK
- 10.1371/journal.pcbi.1006976 is OK
- 10.1038/nmeth.3337 is OK
- 10.1038/s41587-019-0114-2 is OK
- 10.1016/j.cels.2021.05.006 is OK
- 10.1186/s13059-016-1028-7 is OK
- 10.1038/nature10098 is OK
- 10.1126/science.220.4598.671 is OK
- 10.1038/s41592-019-0686-2 is OK
- 10.1038/s41586-020-2649-2 is OK
- 10.5281/zenodo.3509134 is OK
- 10.21105/joss.03021 is OK
- 10.1109/MCSE.2007.55 is OK
- 10.1371/journal.pbio.3002124 is OK
- 10.1101/2022.11.11.516138 is OK

MISSING DOIs

- None

INVALID DOIs

- None