keyalone / EnDecon

2 stars 3 forks source link

Preprocessing of cell-type fractions #1

Open canergen opened 1 year ago

canergen commented 1 year ago

I like the idea of pooling information across methods. Several deconvolution methods yield different interpretation of the results (Cell2location outputs expected cell numbers while destVI outputs likelihood that a count is associated with a specific cell type). There are additional normalization steps in Cell2location (using the expected number of molecules in a specific cell-type and a detection efficiency hyper-parameter per spot). I don't find them to be directly comparable. How do you make the results of the individual methods comparable to use information across those? What interpretation do the fractions after EnDecon have?

keyalone commented 1 year ago

Good question. Thank you for your interest in our EnDecon. For the first question, the comparability of the results of the Cell2location and destVI. Ans: Definitely, there are differences between the results of the cell2location and destVI. However, our EnDecon is aim to ensemble the deconvolution results of the individual methods. For cell2location and destVI, we only need to obtain the cell type abundance matrix, where the rows represent the spots and the columns represent the cell type pre-defined in the reference scRNA-seq data. The deconvolution result matrix of the cell2location and DestVI is comparable.

For the second question, the comparability of the results of the cell2location with or without normalization. Ans: As for the normalization steps in Cell2location. In our application, we don't compare the results of cell2location with normalization and cell2location without normalization. However, as far as I know, there exist several benchmarking works for the discussion of the effect of the data processing on the results of the deconvolution. "Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution", "Benchmarking and integration of methods for deconvoluting spatial transcriptomic data" and "A comprehensive comparison on cell-type composition inference for spatial transcriptomics data". Maybe you can check the works and find the differences.

For the third question, what interpretation do the fractions after EnDecon have? Ans: Do you want to ask the meaning of the output of the EnDecon? Due to low-resolution of spatial transcriptome data, each spot contains multiple cells and the cells are from different cell types. So, our EnDecon could infer the proportion of cell type in each spot. For our EnDecon, the main output is the cell type abundance matrix, where rows represent the spots and the columns represent the cell type. The element of the matrix represents the proportion of the cell in the spot. The fractions you said maybe correspond to the cell type proportion in spots.

If there still have questions, please don't hesitate to discuss them with us. Thanks.

canergen commented 1 year ago

Thanks for your extensive and quick response. I understand the idea of deconvolution. As a coauthor of destVI and working with Cell2location, I can clearly say that the output is not exactly comparable. The library size (destVI) vs detection efficiency (Cell2location) in single cell data is handled differently. This leads to a normalization of library size in destVI, whereas number of transcripts is a cell-type dependent factor in cell2location. Let's assume you have hepatocytes and Tcells in a spot. We know that hepatocytes contain more transcripts (largely due to cell size). DestVI will report a higher fraction of hepatocytes in a spot compared to Cell2location as the output has a different meaning: Likelihood that a random single transcripts comes from a hepatocyte (destVI) vs estimated number of hepatocytes in a specific spot. I haven't compared all the different tools but have also read the implementation of stereoscope in scvi-tools, which reports a similar result to destVI. Do you have other insights to state that both results are directly comparable? PCC is not affected by this while MSE is effected (scaling is per cell-type)