theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
141 stars 23 forks source link

Multiple comparisons #97

Open mainharryHR opened 1 month ago

mainharryHR commented 1 month ago

Dear Sir or Madam,

Thanks for the great packages. The multiple comparison have been discussed several times, But I am still not sure how to incorporate for my study. It seems in your published paper, you subset the samples, I feel it is a labour intensive strategies and it is not considering effects for multiple comparisons. Please advice! Is there way to change the formula="conditions",

1: 3 conditions: Control(Healthy control), Carrier(Mutant without Cancer) ,CRC( Mutant with Cancer). I want to compare Control VS Carrier, Control VS CRC, Carrier VS CRC. How should I do that? I am thinking this way: formula="Control + Carrier + CRC, am I right?

Here comes current results: print(sccoda_data["coda"].varm.keys()) KeysView(AxisArrays with keys: intercept_df, effect_df_Cancer[T.No], effect_df_Cancer[T.CRC])

2: I have another group of samples: 4 conditions: Healthy, Inflammation, Healthy-treatment, Inflammation-treatment. I want to mainly compare Healthy VS Inflammation, and No treatment VS After treatment. What strategies should i use?

I really appreciate your kind help!

Best, Harry

johannesostner commented 1 month ago

Hi @mainharryHR, for both cases, you can combine all samples into one dataset with a column in .obs that denotes the condition (e.g. column "Condition" with values "Control"/"Carrier"/"CRC") and then run scCODA with formula "Condition". This will give you results on how Carrier and CRC differ from the control group.

For the pairwise group tests that you described, you will have to subset the data and run scCODA separately for each pair. This shouldn't be too labour intensive though - just subset the data and run scCODA 2 or 3 times

mainharryHR commented 1 month ago

Dear Johan

Thank you very the helpful comments.

Best, Harry