Closed jenzopr closed 4 years ago
I agree with the issues @jenzopr raised.
In addition, I would like to see improvements for the following parts:
scTree
suggests a (small) set of genes that separate populations. Why not state this right in the beginning?Based on the previous points, the "summary" part of the introduction could be simplified to something like this?:
Single-cell RNA sequencing (scRNA-seq) is a now commonly used technique to measure the transcriptome of many cells. Clusters of these transcriptomes identify cell populations (ref). There are multiple methods available to identify "marker" genes that separate these populations (refs). However, there are usually too many genes in these lists to directly suggest an experimental follow-up strategy for selecting them from a bulk population (e.g. via FACS (ref)). Here we present
scTree
, a tool that aims to provide a minimal set of genes to separate populations in scRNA-seq in a follow-up experiment.
ranger
package: only one is the package citation, are the others relevant for this software paper? If I as a reader am particularly interested in how ranger
implements their RF, I'd look at the citations in that paperscTree
use RF? Why is it listed with 6 different ML methods? (are those available but RF is default?) Where does the t-test and the Wilcox come in? This is not clear from the software paper.scTree
to underline this point?Thank you @jenzopr and @mschubert for your comments! I have addressed the major and minor issues .
From my point of view, the paper gained substantially from your edits! Nice work :+1:
This issue is part of the JOSS review.
Software paper review
In their paper, Paez et. al. identify unfulfilled needs during the translation of findings from computational scRNA-seq analysis to downstream wet-lab methodologies. Especially for marker gene detection, a critical step in characterization of cellular (sub)populations, there seems to be no applicable method that transfers knowledge from in silico to the bench. They present scTree, an R package for marker gene detection that employs random forests for variable selection and a classification tree that resembles FACS gating strategies, thereby enhancing interpretability and application of detected marker genes in downstream wet-lab experiments. Overall the manuscript is well written and does not require major editing for structure or language. They benchmark the quality of their method quite elegantly using recall statistics on test data that has been left out during training.
Major issues
Minor issues