Closed ngvananh2508 closed 1 month ago
It gave the unexpected MuData: obs of coda was 'gender', not 'condition'. condition has 2 categorical variables: 'T2D' and 'non-diabetic. This dataset has four batches: 'batch 1', 'batch 2', 'batch 3', 'batch 4'.
Sorry, I don't quite understand the problem. Did you get a traceback or what is the issue?
Thanks!
In your tutorial, when you assign ['condition'] to covariate_obs, coda object will have 'condition' obs. But when I followed the tutorial, I received a coda object with 'gender' obs instead of 'condition' obs. This is my adata.obs:
Okay, thanks. I'll look into it as soon as possible
My Anndata object only has one value for gender obs: 'male'. I want to compare the cell composition between two groups: T2D and non-diebetic of 'condition' obs.
Okay, thanks. I'll look into it as soon as possible
Thank you very much.
Would it be possible for me to have your dataset or would you be able to create a minimal working example to reproduce the bug, please?
This is the dataset. I ran these code lines: sccoda_model_draft = pt.tl.Sccoda() sccoda_data_draft = sccoda_model_draft.load( adata_1, type="cell_level", generate_sample_level=True, cell_type_identifier="celltype", sample_identifier="batch", covariate_obs=["condition"], ) sccoda_data_draft
MuData object with n_obs × n_vars = 3803 × 4013 2 modalities rna: 3799 x 4000 obs: 'gender', 'age', 'condition', 'batch', 'celltype' layers: 'counts' coda: 4 x 13 obs: 'gender' var: 'n_cells'
I can not upload the data on here. Can I send the data to you via your email?
Yes you can! You can find my email on my Github profile. You can also send me a link to a file hoster there.
And by the way, this dataset composes of only four batches. Do you know the way to increase the statistical significance of each cell type in the dataset. Now I have to set the fdr equaling to 0.5, If I set it to a lower value, I will not get any information about the changes of the cell types. I mean that when I ran: sccoda_model_male.credible_effects(sccoda_data_male_final, modality_key="coda") It return all False values.
I'll check the issue out tomorrow. On my TODO list.
And by the way, this dataset composes of only four batches. Do you know the way to increase the statistical significance of each cell type in the dataset. Now I have to set the fdr equaling to 0.5, If I set it to a lower value, I will not get any information about the changes of the cell types. I mean that when I ran: sccoda_model_male.credible_effects(sccoda_data_male_final, modality_key="coda") It return all False values.
You can consider testing healthy versus one condition separately instead of testing vs all non healthy conditions jointly. If this is not an option for your dataset, I am afraid that you may not have any significant shifts in composition. But maybe @johannesostner wants to chip in?
No, the condition variable of the dataset only comprises two values: non-diabetic and T2D. I know that if the model uses the Gaussian distribution as null model, the dataset must contain many batches to use this assumption according to the central limit theorem.
Regarding significance:
Even though scCODA does not use the Gaussian as a null distribution (it's a Bayesian model, so there is no "Null model", but rather the posterior gets informed by the prior and the observations), the concept of more samples -> higher power still holds. Therefore, there is no way to increase statistical significance without violating some foundational statistical principles.
Regarding significance:
Even though scCODA does not use the Gaussian as a null distribution (it's a Bayesian model, so there is no "Null model", but rather the posterior gets informed by the prior and the observations), the concept of more samples -> higher power still holds. Therefore, there is no way to increase statistical significance without violating some foundational statistical principles.
So scCODA can only be applied for the data which composes of many batches or can we create batches by ourselves (just randomly creating samples for the statistical test)?
This depends on the effect strength. For very strong effects, small sample sizes are sufficient to detect them also at lower FDR levels. For weaker effects, you will need more samples to find them with high credibility.
Randomly creating samples (e.g. through bootstrapping) is a known practice, but must be done correctly to not violate statistical principles. Nevertheless, this will not improve your statistical power if done correctly
If I randomly assign each cell to one of a list from 'batch 1' to 'batch 10', will this technique enhance the statistical power?
I identified the issue and pushed a fix to main. However,
adata.obs.groupby("batch").nunique()
shows that your batches do not only have single conditions but several.
The expectation is:
But your dataset results in:
You identify that by a warning.
Could you please test with main
?
What do you mean "test with main"?
@ngvananh2508 sorry for my brievity
pip install git+https://github.com/theislab/pertpy.git@main
@ngvananh2508 sorry for my brievity
pip install git+https://github.com/theislab/pertpy.git@main
Thank you. I will test it and let you know the result ASAP.
@ngvananh2508 sorry for my brievity
pip install git+https://github.com/theislab/pertpy.git@main
When I install this pertpy on a new env and import it, it raise this error: ModuleNotFoundError: No module named 'arviz' ModuleNotFoundError: No module named 'pydeseq2' ModuleNotFoundError: No module named 'formulaic' ModuleNotFoundError: No module named 'toytree' I can get around these issues by installing them, but I still report in case you want to modify your library
I identified the issue and pushed a fix to main. However,
adata.obs.groupby("batch").nunique()
shows that your batches do not only have single conditions but several.The expectation is:
But your dataset results in:
You identify that by a warning.
Could you please test with
main
?
So you mean that each batch only consists of one condition? And is this the reason that sccoda appears weird behavior when running with my data?
Report
When I ran this snippet code: It gave me the unexpected MuData: obs of coda was 'gender', not 'condition'. condition has 2 categorical variables: 'T2D' and 'non-diabetic. This dataset has four batches: 'batch 1', 'batch 2', 'batch 3', 'batch 4'.
Version information
pertpy 0.8.0