Closed AClab-sgarcia closed 1 year ago
Hi @AClab-sgarcia,
from your description of the error message, I cannot spot the problem. Could you please post the entire message?
Also, which versions of scCODA
, tensorflow
, tensorflow-probability
and pandas
are you using? Just making sure that all versions are up to date and compatible...
Thanks!
Good morning @johannesostner
I can't send you any more information than that, as that's all the error I get. As I said, I followed the "Using all cell types as reference alternatively to reference selection" section of: https://sccoda.readthedocs.io/en/latest/Modeling_options_and_result_analysis.html
The version of the packages:
Thank you!
This seems to be a pandas error, but I cannot see which function caused this error from what you posted.
I was looking for an error traceback like Error in ....: ValueError: cannot reindex on an axis with duplicate labels
to point down which line caused the error, but I guess that it is this one: ´cred_eff.index = cred_eff.index.droplevel(level=0)´.
How many conditions do you have in your Cond
column? If it's more than two, I could see a way how this error is caused
When your object scCODA_data
is an anndata
object, please check if the obs_names
are unique. You may run scCODA_data.obs_names_make_unique()
, which should fix issues with the indexing.
@johannesostner I have 4 samples, with 3 differente conditions in the "Cond" column. Yes, I think it's a pandas' error too, but I haven't been able to figure it out.
@mbuttner
scCODA_data.obs_names
output:
Index(['AC1939', 'AC1941', 'AC1944_B', 'AC1944_T'], dtype='object', name='orig.ident')
Thanks. That looks good, though. The anndata
object has more index objects, though, which may cause the error. Is scCODA_data.var_names
unique? Can you share the output of print(scCODA_data)
, if possible?
Thanks for your quick reply @mbuttner
scCODA_data.var_names
output:
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20'], dtype='object', name='seurat_clusters')
print(scCODA_data)
output:
AnnData object with n_obs × n_vars = 4 × 21 obs: 'Cond' var: 'n_cells'
OK, the indexes look all unique, the anndata
object is fine as far as I can tell.
Have you tried to run scCODA
without a reference cell type? Does that result in the same error?
The problem does likely not come from the initial scCODA data but from the line
cred_eff.index = cred_eff.index.droplevel(level=0)
, which uses the scCODA output DataFrame.
The line drops the covariate indicator level (which is no problem if there is only one covariate, but leaves non-unique names if there are multiple covariates).
I'll see if I can find a workaround
Thanks @johannesostner
@AClab-sgarcia just to make sure that I am not mistaken here: Are you able to run scCODA on your data outside of the loop? i.e.
model_temp = mod.CompositionalAnalysis(scCODA_data, formula="Cond", reference_cell_type="automatic")
temp_results = model_temp.sample_hmc(num_results=20000)
@johannesostner I've been able to run scCODA both with a automatic reference selection and manual reference selection. I was trying to run it inside the loop as I can not really see any clsuter with low dispersion of relative abundance. I just wanted to compare the different results with different clusters as the reference.
Ok, I managed to reproduce your error and find a workaround - the error was indeed caused by the multiple covariates in your data.
You can use this code as an alternative:
cell_types = scCODA_data.var.index
results_cycle = None
for ct in cell_types:
print(f"Reference: {ct}")
# Run inference
model_temp = mod.CompositionalAnalysis(scCODA_data, formula="Cond", reference_cell_type=ct)
temp_results = model_temp.sample_hmc(num_results=20000, num_burnin=5000)
# Select credible effects
cred_eff = temp_results.credible_effects()
cred_eff.index = ['_'.join(i) for i in cred_eff.index.values]
# add up credible effects
if results_cycle is None:
results_cycle = pd.DataFrame(index=cred_eff.index, columns=["times_credible"]).fillna(0)
results_cycle["times_credible"] += cred_eff.astype("int")
What do you mean by multiple covariates in my data? Do you mean that i was using 3 conditions for 4 different samples?
Exactly! The code in the tutorial was only intended for a dataset with one covariate, i.e. a comparison of two conditions. It is totally valid to do the same analysis while comparing multiple conditions (resulting in multiple covariates), as you intend to do. The code you used was simply not intended to deal with this scenario, therefore I created a small workaround for your case that does the same, but can deal with multi-group comparisons
Thank you so much for the help and also for the package!
Good morning, Thank you for developing this very useful package. I am running scCODA with my data. I am trying to use the code available in the section "Using all cell types as reference alternatively to reference selection".
`for ct in cell_types: print(f"Reference: {ct}")
As soon as you use the first cell_type, the following error occurs:
Reference: 0 100%|██████████| 20000/20000 [01:02<00:00, 321.19it/s] MCMC sampling finished. (79.329 sec) Acceptance rate: 59.6% ValueError: cannot reindex on an axis with duplicate labels
I have tried to solve it, but all my solutions result in errors.
Thanks