Potential bugs for label transfer

Dear HLCA team,

Thanks for developing the HLCA. I met a bug when I tried to query my data to the reference using the code in scarches/notebook/hlca_map_classify.ipynb.

combined_emb.obs = combined_emb.obs.join(labels) combined_emb.obs = combined_emb.obs.join(uncert)

After this code, for some reason the order of the obs changed, which might be caused by the update of pandas and the non-uniqueness of index of combined_emb. This change totally messed the data, causing the assignment of the cell type random. This can be fixed by making the obs name of combined_emb unique and then changing the index of labels/uncert: combined_emb.obs_names_make_unique() labels.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy() uncert.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy() combined_emb.obs = combined_emb.obs.join(labels, sort = False) combined_emb.obs = combined_emb.obs.join(uncert, sort = False)

By the way, I ran the code in both Google Colab and AWS and had the same problem.

theislab / scarches

Potential bugs for label transfer #198