ventolab / CellphoneDB

CellPhoneDB can be used to search for a particular ligand/receptor, or interrogate your own HUMAN single-cell transcriptomics data.
https://www.cellphonedb.org/
MIT License
305 stars 52 forks source link

Normalization method with Seurat Object. #115

Closed ChoiJi-Hye closed 1 year ago

ChoiJi-Hye commented 1 year ago

Hi,   Thank you for creating such a great tool! I am trying to use it with a Seurat object containing 130,000 cells, and I have a question about normalization for using this tool.   I noticed that in previous versions, a different normalization method was used instead of log normalization. (https://github.com/Teichlab/cellphonedb/issues/279). In your biorxiv, I saw that for cellphonedb V2.0 with Seurat object, you divided the expression of each gene in a cell by the total expression of the cell and multiplied by 10,000 for normalization. However, in your 0_prepare_your_data_from_Seurat.ipynb script, I saw that log-normalization was performed on Seurat object.   In the previous versions, I thought that you recommended a different normalization method than log normalization, but it seems like the latest version is using log normalization. I wanted to ask if it is okay to use log normalization for the latest versions.   Even though it is an old issue, I saw that when a user used log normalization, only 18 outputs were generated, whereas using raw count, 1800 results were obtained. (https://github.com/Teichlab/cellphonedb/issues/12). I think it shows that normalization is an important issue.   I tried to use a version of cellphoneDB that includes DEG analysis, but it wasn't the latest version. When I used log normalization, I only got four results, which was too few. I wondered if this was due to log normalization, but the latest version of cellphoneDB seems to be agnostic to any type of normalization.   As seen in this issue, methods 1 and 2 depend on mean values, so normalization is required, but it is stated that it is just a preferred normalization (https://github.com/ventolab/CellphoneDB/issues/84).

  1. Does this mean that any normalization, including log-normalization, is acceptable?   2. Or is there a recommended normalization method that you can recommend? I want to find a way to improve my output because it is too low. Should I use the method used in cellphoneDB V2.0? (count/sum(count) * 10000)   Also, since there are 130,000 cells, the data is too large to run the tool in txt file format. Therefore, I am trying to convert it to h5ad format.
  2. Can I use sceasy for this? (https://github.com/Teichlab/cellphonedb/issues/321) I used the r tool, saveH5seurat, to convert it because I am not familiar with Scanpy. Can the conversion process affect the results? Do you still recommend using sceasy to convert Seurat to h5ad?  

Sorry for the long question. I want to use your tool, but I have many questions because it doesn't work as I expected, and I've been searching for solutions.   Thank you so much for creating such a great tool!    Best regards, Ji-Hye Choi.

ktroule commented 1 year ago

Hi.

You can use any normalization method. If make use of the subsampling strategy you must indicate whether your data is log transformed or not subsampling-log.

I would suggest you to read your data employing seurat and save it as a h5ad file. Both methods should convert the objects while keeping the exact values.

Kind regards