livnatje / DIALOGUE

DIALOGUE is a dimensionality reduction method that uses cross-cell-type associations to identify multicellular programs (MCPs) and map the cell transcriptome as a function of its environment.
Other
106 stars 16 forks source link

Issue with get.abundant function in DIALOGUE1 with abn.c=15 #10

Closed DavidB-XI closed 2 years ago

DavidB-XI commented 2 years ago

Hi Livnat, Looking through the code I notice that you use the function:

b<-get.abundant(r@samples,abn.c = 15,boolean.flag = T)

with input parameter abn.c set at 15, is this too high?

I believe on the smaller dataset I am using, I get the following error: Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

But when I reduce abn.c to 2, it runs well

b<-get.abundant(r@samples,abn.c = 2,boolean.flag = T)

It would be good to get some comments on this:

livnatje commented 2 years ago

abn.c is the cell abundance constraint that defines the minimal number of cells a sample should have (per cell type) to be considered for further MCP identification. For single cell data it is recommended to not go below 15, and probably aim for ~100 cells per sample (per cell type) to get robust statistics and avoid effects from non-uniform sampling. For spatial data you can certainly use niches that have very few cells, even just one per cell type (or alternatively just have spatial.flag = TRUE, see below).

abn.c is now a parameter the user can tune (with 15 being the default). In addition spatial.flag, which is FALSE by default, can be set to TRUE if working with spatial data with small niches (for single cell data or spatial data with large niches it's recommended to keep it spatial.flag = FALSE). spatial.flag = TRUE will bypass the abn.c constraint and also won't run the ANOVA test for feature removal.

DavidB-XI commented 2 years ago

Great, 100 cells make sense and is expected to have. The only situation is when you have rare cell types that could go less than 100 cells? And yes, running without feature removal should make sense on the occasion.

Also, have you thought of putting the configuration input parameters into a list to make things cleaner? Looks less daunting and putting it into a function like the make.cell.types would allow room for an explanation in the documentation? More work to fix it up, but would provide more explanation.

livnatje commented 2 years ago

The default cutoff is at least 15 cells (of each cell type) per sample, which should probably work in most settings even with rare cells. The paper also includes simulations showing the MCPs can be identified even with extremely rare cells. There is a procedure of permutation tests and empirical values which are used to avoid over fitting, but it's recommend to also have external unseen data to examine generalizability.

As for the input parameters, do you mean just providing those as a list instead of separate components or having another separate function to prepare an input param object?

DavidB-XI commented 2 years ago

Thanks for the advice on rare cell types, and the statistical approaches to validate it.

For the parameters, it could be both: input the parameters as a list; and also allow creation of a "default list" with a separate function that the user can edit. In my opinion, that would make things neater and more accessible to approach the implementation.

livnatje commented 2 years ago

I assume the issue with get.abundant function in DIALOGUE1 was resolved. So closing this. We will make improvements for the UI in the upcoming releases or sooner. Thanks!