theislab / scCODA

A Bayesian model for compositional single-cell data analysis
BSD 3-Clause "New" or "Revised" License
142 stars 23 forks source link

Mismatched sample labels #41

Closed mhulke closed 2 years ago

mhulke commented 2 years ago

I have used scCODA in the past without any problems. Now that I am using it with new samples, I noticed that the sample labels on the bar graph don't line up with the data. In the attached graph, I know from umaps that sample 10 contains cell type 5, but that specific bar is always located on the far right and given whichever sample label is at the end of 'B'. It doesn't matter if I change the covariate_df or if I change the order of samples within my anndata, my graph always comes out with mismatched labels. Code below. Any help with this glitch would be greatly appreciated.

B=['DG_4365_WS2','DG_4363_WS3','DG_4367_WS5','DG_4369_WS6','DG_4377_WS10','DG_4365_WR2','DG_4362_WR2','DG_4371_WR6','DG_4376_WR9','DG_4382_WR12','DG_4404_AS7','DG_4408_AS9','DG_4395_AS3','DG_4401_AS5','WR_4396_AS4','DG_4390_AR2','DG_4400_AR4','DG_4405_AR6','DG_4409_AR8','DG_4402_AR5'] cov_df = pd.DataFrame({"Cond": ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20']}, index=B)

data_scanpy_1 = dat.from_scanpy( adata, cell_type_identifier="leiden", sample_identifier="batches", covariate_df=cov_df )

viz.stacked_barplot(data_scanpy_1, feature_name="Cond",cmap=umap_cmap) plt.xticks(rotation='vertical') plt.title('Sample Composition')

scCODA_mislabeled_bar

johannesostner commented 2 years ago

Hi! Thanks for reporting this bug!

From what I can tell from your description, the issue arises due to the indices of data_scanpy_1.X and data_scanpy_1.obs being misaligned. This could happen in dat.from_scanpy because anndata does not check automatically if indices align. I just pushed a fix for this in the branch update_0.1.6. I hope that this solves the problem!

mhulke commented 2 years ago

Thanks. I'm working in Terra with scCODA installed via pip. How do I go about pulling the fix?

johannesostner commented 2 years ago

You can install an exact branch from github with pip: pip install git+https://github.com/theislab/scCODA.git@update_0.1.6.

The version number of your scCODA installation should then be 0.1.6. If not, you can try uninstalling scCODA (pip uninstall sccoda) before installing with the same command as above.

mhulke commented 2 years ago

I was able to download it and I can see version 0.1.6 in my packages folder, but I am still getting the same mismatched output. Could something else be causing the problem?

johannesostner commented 2 years ago

Hmm, then there seems to be another issue that causes this problem. Could you please help me pinning down the cause of it by checking whether the rows in your data_scanpy_1.X (cell counts) and in data_scanpy_1.obs (covariates) are correctly matched (e.g. does row 10 in data_scanpy_1.X contain lots of cell type 5 or is it row 20)?

Also, you could plot viz.stacked_barplot(data_scanpy_1, feature_name="samples",cmap=umap_cmap) to plot the data with the index of data_scanpy_1.obs as the x-axis labels, if you are more familiar with these labels than with the numbers 1-20.

Thanks in advance!

mhulke commented 2 years ago

viz.stacked_barplot(data_scanpy_1, feature_name="samples",cmap=umap_cmap)

image

data_scanpy_1.obs: image

Now it seems like the samples are lined up correctly. Sample WR_4396_AS4 is the correct sample containing cell type 5. But now it looks like I can't put the data in the order I assign in cov_df. If I assign the samples numbers 1-20 based on the treatment, it mixes them up as such:

viz.stacked_barplot(data_scanpy_1, feature_name="Cond",cmap=umap_cmap) image

johannesostner commented 2 years ago

I'm glad that the samples and metadata are aligned now. This is the bug that I tried to fix with the new version.

The plotting function uses per default the same order as the data, which is the order in data_scanpy_1.X (the cov_df was shuffled in order to match the order of samples there). To order the x-axis in the plot, you can set the desired order in the level_order argument, in your case: viz.stacked_barplot(data_scanpy_1, feature_name="Cond",cmap=umap_cmap, level_order=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20']).

I hope that this gives you the desired plot!

mhulke commented 2 years ago

Awesome. Thank you. Everything is looking good now.