Open feanaros opened 4 months ago
Hi @feanaros,
the ALR-t model performs a t-test on additive log-ratio (ALR) transformed data. For this transformation, you need a reference, as in the scCODA model. If you want to compare both approaches, you should use the same reference. In your example, we took the 5th cell type in the dataset as a reference.
As for your second question, this happens if there's no cell type in your dataset that was credible in more than half of the runs. You could either raise the FDR rate in each run or lower the threshold in the second line of the code snippet you showed to get credible effects.
@johannesostner thank you. So, for the first question, if I don't have a reference, how can I decide it? With the second test? or in a different way?
however, in the standard analysis of sccoda, when I print print(sim_results.credible_effects())
, it gives me all False.
Is my analysis not significant in therms of proportions?
(I set both FDR 0.1 and 0.4, same result)
There are some parameters I can change?
I'm not able to run Aldex or other tests. I have a couple of errors. Maybe I'm doing it some wrong
this is my input:
df = pd.read_csv("/Users/olga/table_sample_cluster.csv")
df
Unnamed: 0 C0 C1 C10 C11 C2 C3 C4 C5 C6 C7 C8 C9 Sample Genotype
0 notch3_1 856 897 10 0 542 223 228 6 62 39 24 19 notch3_1 notch3
1 notch3_2 974 749 14 0 512 180 156 186 52 30 17 14 notch3_2 notch3
2 notch3_3 1401 1320 22 1 942 286 304 63 104 46 42 10 notch3_3 notch3
3 wt_1 725 595 11 7 562 145 147 16 50 61 28 14 wt_1 wt
4 wt_3 1508 1164 14 49 1029 304 263 187 112 99 39 36 wt_3 wt
data_all = dat.from_pandas(df, covariate_columns=["Sample", "Genotype","Unnamed: 0"])
data_all.obs
Sample Genotype Unnamed: 0
0 notch3_1 notch3 notch3_1
1 notch3_2 notch3 notch3_2
2 notch3_3 notch3 notch3_3
3 wt_1 wt wt_1
4 wt_3 wt wt_3
data_all
AnnData object with n_obs × n_vars = 5 × 12
obs: 'Sample', 'Genotype', 'Unnamed: 0'
data_all.obs["Genotype"]
0 notch3
1 notch3
2 notch3
3 wt
4 wt
Name: Genotype, dtype: object
print(data_all)
AnnData object with n_obs × n_vars = 5 × 12
obs: 'Sample', 'Genotype', 'Unnamed: 0'
# Select control and TBX1 KO data
data_day = data_all[data_all.obs["Genotype"].isin(["notch3", "wt"])]
print(data_day.obs)
Sample Genotype Unnamed: 0
0 notch3_1 notch3 notch3_1
1 notch3_2 notch3 notch3_2
2 notch3_3 notch3 notch3_3
3 wt_1 wt wt_1
4 wt_3 wt wt_3
The tests you mention (ALR-t, ALDEx2, ...) are alternatives to scCODA, which we mainly used for the comparison study in our paper (https://www.nature.com/articles/s41467-021-27150-6).
For ALDEx2 and ANCOM-BC, you'll need an R environment with the packages installed and set the correct paths to this environment (r_home
and r_path
in the tutorial, these are specific to your operating system/installation location)
scCODA has an automatic reference selection (by setting reference="automatic"
), which you can use. It will tell you the reference it selected.
Currently, you don't get any credible effects from your data, probably because the proportions do not differ too much between the conditions. Due to your low sample size, you'll need a considerable effect for it to be detected credibly. You might want to visually check your data through a grouped boxplot like in our advanced tutorial. There, you can also find more info on reference selection, etc.
Hi, what is meant by
alr_t_model.fit_model(reference_cell_type=4)
in the frequentist test? I don't know which cell type to use as a reference.Moreover, when I loop to evaluate the best reference cell type, after:
all the results of
is_credible
areFALSE
. Can you give me some advices?