Closed FADHLyemen closed 3 years ago
Hi! The actual counts of the reference cell type are not as important, as long as they are nonzero in most samples. So I'd go with a cell type that has a rather low dispersion, but high presence regardless of its mean count.
Currently, there is no function in scCODA that outputs a DataFrame to help with reference finding, but you might find the code we use in the rel_abundance_dispersion_plot
function helpful:
rel_abun = data.X / np.sum(data.X, axis=1, keepdims=True)
percent_zero = np.sum(data.X == 0, axis=0) / data.X.shape[0]
nonrare_ct = np.where(percent_zero < 1-abundant_threshold)[0]
# select reference
cell_type_disp = np.var(rel_abun, axis=0) / np.mean(rel_abun, axis=0)
is_abundant = [x in nonrare_ct for x in range(data.X.shape[1])]
plot_df = pd.DataFrame({
"Total dispersion": cell_type_disp,
"Cell type": data.var.index,
"Presence": 1-percent_zero,
"Is abundant": is_abundant
})
The plot_df
should give you all the info on dispersion and presence of the cell types.
Thank you, it confused me "Cell types that have a higher presence than a certain threshold (default 0.9) are suitable candidates for the reference and thus colored" so if the presence is 0, why you color most of them. https://sccoda.readthedocs.io/en/latest/Data_import_and_visualization.html
I think it should be Cell types that have a higher abundant not presence. What do you think?
Oh, I see what you mean! That is obviously a bug, the x-axis in the plot depicts the absence (1-presence) instead of the presence!
I am very sorry for that, will fix it as soon as possible.
Fixed on main branch!
Thank you.
I have difficult time selecting reference celltypes. dispersion plot show one cell type but it has small count do you have any recommendation how many cells as minimum should be consider as a reference.