scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 599 forks source link

correlation between celltypes and age #1845

Open FADHLyemen opened 3 years ago

FADHLyemen commented 3 years ago

...

How to do correlation between celltypes and age in scanpy?

FADHLyemen commented 3 years ago

@giovp I want to make correlation plot between cell types and the continuous variables stored in .obs

Koncopd commented 3 years ago

I would say this is not a scanpy question. It is not clear what do you mean by correlation of a categorical variable with multiple categories and a continuous variable. If you have a binary categorical variable, you can calculate Point Biserial Correlation, but for a multicategorical variable you would have to discretize your continuous variable and calculate Chi-squared test. You can also try ANOVA. If you think you know what variables are dependent and independent you can use logistic regression and look at its coefficients or try ANCOVA. some additional information with examples https://datascience.stackexchange.com/questions/893/how-to-get-correlation-between-two-categorical-variable-and-a-categorical-variab

FADHLyemen commented 3 years ago

@Koncopd it is a correlation between two continuous variables as celltypes are continuous and age is also continuous. how to correlate X with continuous variables stored in .obs ?

Koncopd commented 3 years ago

Are celltypes really continuous? How does this variable look like? for continuous you can do from scipy.stats import pearsonr r, _ = pearsonr(adata.obs["celltypes"], adata.obs["age"])

FADHLyemen commented 3 years ago

@Koncopd it is the # of celltypes per each cohort or the relative_frequencies per each group: image

is it something researchers looking for? or do you think this not good approach as cells depends on how many cells per sample