Closed denvercal1234GitHub closed 10 months ago
Does not need to be but ok if is. "Technically speaking, the test dataset does not need log-expression values but we compute them anyway for convenience."
could you point to which tutorial you are referring to?
@susansjy22 is this the case for HPCell?
could you point to which tutorial you are referring to?
This guide: https://stemangiola.github.io/tidySingleCellExperiment/articles/introduction.html
@william-hutchison could you please help with this?
Regarding point 1, I think the tutorial material is okay as is. The test data does not have to have to be log-transformed, but it does not have to be raw either:
"For the test data, the assay data need not be log-transformed or even (scale) normalized. This is because SingleR() computes Spearman correlations within each cell, which is unaffected by monotonic transformations like cell-specific scaling or log-transformation." https://bioconductor.org/books/release/SingleRBook/classic-mode.html
I assume this is why you closed the issue @denvercal1234GitHub ? Please let us know if you have any further concerns though.
I could add a note on this topic to the tutorial. Although given any assay data is fine, maybe this information is unnecessary for the user.
The thing I would like to understand is, is proving lo-transformed data an error?
Either SingleR has a method to detect if data is logged, or applying a statistics designed to non-logged data (and the other way around), to logged data, is almost never a good idea.
@susansjy22 is this the case for HPCell?
For HPCell, logNormCounts() is used to transform the test data prior to annotation with SingleR(). This should be fine, as @william-hutchison has stated. Since SingleR() uses Spearman correlation which is relies on a rank order of value rather than their actual magnitudes, so monotonic transformations like log-transformation or scaling wouldn’t affect the analysis.
Yes I can confirm, I just tested the SingleR with both logcounts and counts and the output was identical.
Very well explained! OK, in this case, @susansjy22 let's omit the transformation for SingleR execution, @william-hutchison we can omit it in the tidySCE
documentation.
Link the pull requests on the respective repositories in this issue (under development menu on the right on the pull request)
Hi there,
Q1: In the tutorial,
logcounts
were used, but inSingleR documentation
, it was strongly advised against using of any transformed data and prefer raw counts. Would you mind clarifying if there is a reason you guys uselogcounts
here?Q2: If my sce object is flow data, I will simply set
Matrix::Matrix(sparse = F)
?Thank you.
From SingleR (https://bioconductor.org/books/release/SingleRBook/classic-mode.html): "For the test data, the assay data need not be log-transformed or even (scale) normalized. This is because SingleR() computes Spearman correlations within each cell, which is unaffected by monotonic transformations like cell-specific scaling or log-transformation. It is perfectly satisfactory to provide the raw counts for the test dataset to SingleR(), which is the reason for setting assay.type.test=1 in our previous SingleR() call for the Grun dataset."