theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
147 stars 24 forks source link

how to select cell reference #27

Closed FADHLyemen closed 3 years ago

FADHLyemen commented 3 years ago

I have difficult time selecting reference celltypes. dispersion plot show one cell type but it has small count do you have any recommendation how many cells as minimum should be consider as a reference.

johannesostner commented 3 years ago

Hi! The actual counts of the reference cell type are not as important, as long as they are nonzero in most samples. So I'd go with a cell type that has a rather low dispersion, but high presence regardless of its mean count.

Currently, there is no function in scCODA that outputs a DataFrame to help with reference finding, but you might find the code we use in the rel_abundance_dispersion_plot function helpful:

rel_abun = data.X / np.sum(data.X, axis=1, keepdims=True)

percent_zero = np.sum(data.X == 0, axis=0) / data.X.shape[0]
nonrare_ct = np.where(percent_zero < 1-abundant_threshold)[0]

# select reference
cell_type_disp = np.var(rel_abun, axis=0) / np.mean(rel_abun, axis=0)

is_abundant = [x in nonrare_ct for x in range(data.X.shape[1])]

plot_df = pd.DataFrame({
"Total dispersion": cell_type_disp,
"Cell type": data.var.index,
"Presence": 1-percent_zero,
"Is abundant": is_abundant
})

The plot_df should give you all the info on dispersion and presence of the cell types.

FADHLyemen commented 3 years ago

Thank you, it confused me "Cell types that have a higher presence than a certain threshold (default 0.9) are suitable candidates for the reference and thus colored" so if the presence is 0, why you color most of them. https://sccoda.readthedocs.io/en/latest/Data_import_and_visualization.html

I think it should be Cell types that have a higher abundant not presence. What do you think?

johannesostner commented 3 years ago

Oh, I see what you mean! That is obviously a bug, the x-axis in the plot depicts the absence (1-presence) instead of the presence!

I am very sorry for that, will fix it as soon as possible.

johannesostner commented 3 years ago

Fixed on main branch!

FADHLyemen commented 3 years ago

Thank you.