Closed Mia1037 closed 1 year ago
Hi,
This is fine. We analyzed several such data sets in the paper (e.g., from blood, brain, or liver). However, you may have less detection power because scDRS implicitly compares a cell to other cells in the data set, and cells from the right tissue are more likely to be causal. Please see more details in discussion point 2 in the paper, copied below.
In the paper, we had sufficient power in the brain and liver data sets, but less power in the T cell data sets. For the former, we can do both individual cell-level and aggregate analyses. For the latter, we only performed aggregate analyses (associating diseases with fine-grained cell subtype annotations).
Second, the fact that scDRS assesses the statistical significance of an individual cell’s association to disease by implicitly comparing it to other cells via matched control genes may reduce power if most cells in the data are truly causal. For example, association with IBD in a data set containing only Treg cells (one of the causal cell types for IBD) will likely yield largely nonsignificant results. This limitation did not impact our main analyses, because the TMS data include a broad set of cell types; in more specialized data sets (which may be preferred in some settings due to the more comprehensive profiling of the focal cell population), this limitation can potentially be addressed by selecting matched control genes based on a broad cell atlas (for example, the TMS or TS data).
Best, Martin
Hi,
.h5ad
file?I used python via the following steps:
.csv
or .tsv
format) along with the gene-level and cell-level covariate files (.csv
or .tsv
format). Since the file formats may vary, you may need extra preprocessing steps to get the .csv/.tsv
files. .csv/.tsv
files into python using pandas.read_csv
anndata
object using anndata.AnnData
. Count matrix goes to the X
argument, cell-level covariates go to the obs
argument, and gene-level covariates go to the var
argument. anndata
object in .h5ad
format using anndata.AnnData.write
Seurat
in R also offers functionalities for conversion between Seurat
and AnnData .h5ad
files. However, I haven't personally tested these functions.
We want to provide the best support for our users. May I ask if these pointers are sufficient? Would it be helpful to add a Jupyter notebook tutorial? Would it be helpful to add an scDRS CLI function, e.g., scdrs get-h5ad
, to convert .csv/.tsv
files to .h5ad
format? Are there other functions you would like to have?
Best, Martin
Thank you very much for your detailed explanation !
Sometimes, it is difficult to replicate the results for some large scRNA-seq data, but they provide meta data of cells and the tSNE coordinate of each cell (e.g., (-6.250967, 38.407021), so can I just use the raw count matrix (X) and the meta data of the cells (obs) and gene-level covariates (var) and the tSNE coordinate of each cell (obsm)**** to generate .h5ad file instead of normalization, dimensional reduction and other standardized single-cell analysis processes.
If the above method is feasible, it is embarrassing that I am not sure how to add tSNE to the.h5ad file.
Hi,
This is indeed feasible. When you use anndata.AnnData
to construct the anndata
object, just add the argument obsm={'X_tnse':mat_tsne}
, where mat_tsne
is a numpy.ndarray
of shape (n_cell, 2)
containing the two TSNE coordinates of all cells. Since that's exactly what scanpy.tl.tsne
outputs to the anndata
object, you should be able to perform the analysis as if you computed the tSNE embeddings on your own.
Of course, after you have the anndata
object, you can then use anndata.AnnData.write
to get the .h5ad
file.
Let me know if you have more questions.
Best, Martin
Hi ,
I use the raw count matrix (X) and the meta data of the cells (obs) and gene-level covariates (var) and the tSNE coordinate of each cell (obsm) to generate .h5ad instead of normalization, dimensional reduction and other standardized single-cell analysis processes, so my annData Object has no obsp('distances', 'connectivities'). When I run scdrs perform-downstream to obtain the group-level statistics, the following information is displayed:
Performing scDRS group-analysis
connectivities
not found in adata.obsp
; run sc.pp.neighbors
first
Finish group-analysis for SCZ (sys_time=32.6s)
When I ran the sample data(Zeisel & Muñoz-Manchado), I found that whether the anndata object has obsp('distances ', 'connectivities') or not has no effect on the cell score and assoc_mcp, but hetero_mcp results are different.
For my data , I would like to ask whether the effect of obsp('distances', 'connectivities') on hetero_mcp can be ignored?
Calculation of hetero_mcp directly depend on the connectivities
information. The displayed information indicated that scDRS found that there are no connectivities
information in your adata, and is computing that from scratch.
So I think results of hetero_mcp should also make sense. And it is correct that cell score and assoc_mcp do not depend on connectivities
Hi,
I have two questions that I would like to ask for your guidance.
Any advice would be greatly appreciated!