construct_ligand_target_matrix with addtional ligand-receptor pire seems computationally expensive

adesalegn commented 3 years ago

Thank you for developing such an amazing tool! I was just wondering if the NicheNet would be suitable to study the communication occurring between two closely interacting tissues. To make it clear, I am studying tissue A and another tissue located on tissue A (e.g liver hepatocyte and adipocytes on the liver). I have bulk RNAseq of both and would like to see if they are interacting in their gene products. I did run the NicheNet following the recommendation for bulk RNAseq. However:

I want to add the ligand-receptor pair I found at celltalkDB and construct ligand-receptor-matrix. In the argument "ligands", I gave the list of all ligands(ligands = list(lr$from) and never get the matrix even. It just keeps running ..... it looks computationally expensive. So do you think I should wait till it finishes ...maybe overnight?
In the prediction of the ligand activity, the person correlation of the 1st 20 ligands is between 0.03-, 0.01, and auroc values are 0.5. Do you think it is interpretable for bulk RNAseq?

Thank you very much.

browaeysrobin commented 3 years ago

Hi @adesalegn,

construct_ligand_target_matrix is indeed computationally expensive and takes some time to run. But in your case, the input of ligands is not entirely correct. This should be a list, with a sublist per ligand, and not a list with one element containing the names of the ligands. You should also be aware to only give 'unique' ligand names. To fix all this you could do: ligands = lr$from %>% unique() %>% as.list(). In order to check whether the ligand-target matrix construction works, you could always quickly test on a small subset of ligands, e.g. by doing this: ligands = lr$from %>% unique() %>% head() %>% as.list().

For the second issue: interpretation of bulk and single-cell data should be similar; you could check this answer on our FAQ page about interpreting the ligand activities: https://github.com/saeyslab/nichenetr/blob/master/vignettes/faq.md#how-should-i-interpret-the-pearson-correlation-values-of-the-ligand-activity-what-does-it-mean-when-i-have-negative-values In your case, it seems that there is not really enrichment of the target genes of ligands in your gene set of interest compared to background. This can be true, but it is also possible that your definition of target genes and/or background could be improved.

adesalegn commented 3 years ago

Hi browaeysrobin,

Thank you so much for your explanation. My target genes were genes predicting fasting glucose(685 genes) in the receiver ``` tissue and background were all expressed genes of the receiver(12,500 genes).


target <- hT2D %>% pull(Variables) %>% .[. %in% rownames(ligand_target_matrix)]  # 630 genes
background <- rownames(expressed_genes_receiver) %>% .[. %in% rownames(ligand_target_matrix)] #10243

Best,
Amare

browaeysrobin commented 3 years ago

Hi @adesalegn ,

The analysis you do seems fine, so the most likely reason that the ligand activities are so low is that the enrichment of predicted target genes of ligands in your geneset is low. Nevertheless, it could still be possible that your top-ranked ligands are valuable predictions.

Always know that we are in the process of updating our prior knowledge, so maybe you will find stronger enrichment with the updated version in the near future :)

saeyslab / nichenetr

construct_ligand_target_matrix with addtional ligand-receptor pire seems computationally expensive #76