theislab / scarches

Reference mapping for single-cell genomics
https://docs.scarches.org/en/latest/
BSD 3-Clause "New" or "Revised" License
323 stars 50 forks source link

Potential bugs for label transfer #198

Open shuailinli opened 1 year ago

shuailinli commented 1 year ago

Dear HLCA team,

Thanks for developing the HLCA. I met a bug when I tried to query my data to the reference using the code in scarches/notebook/hlca_map_classify.ipynb.

combined_emb.obs = combined_emb.obs.join(labels) combined_emb.obs = combined_emb.obs.join(uncert)

After this code, for some reason the order of the obs changed, which might be caused by the update of pandas and the non-uniqueness of index of combined_emb. This change totally messed the data, causing the assignment of the cell type random. This can be fixed by making the obs name of combined_emb unique and then changing the index of labels/uncert: combined_emb.obs_names_make_unique() labels.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy() uncert.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy() combined_emb.obs = combined_emb.obs.join(labels, sort = False) combined_emb.obs = combined_emb.obs.join(uncert, sort = False)

By the way, I ran the code in both Google Colab and AWS and had the same problem.

ramadatta commented 5 months ago

@shuailinli Thanks for the code. It seems like something is messed up.

I followed your code and it helped me to get proper label transfer. before:

before

Now:

after