Thanks for developing the HLCA. I met a bug when I tried to query my data to the reference using the code in scarches/notebook/hlca_map_classify.ipynb.
After this code, for some reason the order of the obs changed, which might be caused by the update of pandas and the non-uniqueness of index of combined_emb. This change totally messed the data, causing the assignment of the cell type random.
This can be fixed by making the obs name of combined_emb unique and then changing the index of labels/uncert:
combined_emb.obs_names_make_unique()
labels.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy()
uncert.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy()
combined_emb.obs = combined_emb.obs.join(labels, sort = False)
combined_emb.obs = combined_emb.obs.join(uncert, sort = False)
By the way, I ran the code in both Google Colab and AWS and had the same problem.
Dear HLCA team,
Thanks for developing the HLCA. I met a bug when I tried to query my data to the reference using the code in scarches/notebook/hlca_map_classify.ipynb.
combined_emb.obs = combined_emb.obs.join(labels) combined_emb.obs = combined_emb.obs.join(uncert)
After this code, for some reason the order of the obs changed, which might be caused by the update of pandas and the non-uniqueness of index of combined_emb. This change totally messed the data, causing the assignment of the cell type random. This can be fixed by making the obs name of combined_emb unique and then changing the index of labels/uncert: combined_emb.obs_names_make_unique() labels.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy() uncert.index = combined_emb.obs.index[combined_emb.obs["ref_or_query"] == "query"].copy() combined_emb.obs = combined_emb.obs.join(labels, sort = False) combined_emb.obs = combined_emb.obs.join(uncert, sort = False)
By the way, I ran the code in both Google Colab and AWS and had the same problem.