Open sakatash opened 1 month ago
update:
managed to get a confidence thresholding with this type of logic:
def _knn_classify(self, labels):
# ensure it's categorical
cat_array: pd.Series = self._adata_ref.obs[labels].astype("category")
values = []
confidences = []
for inds in self._indices:
mode_value = cat_array.iloc[inds].mode()[0]
mode_count = (cat_array.iloc[inds] == mode_value).sum()
confidence = mode_count / len(inds)
values.append(mode_value)
confidences.append(confidence)
# Create a DataFrame for better readability
classification_df = pd.DataFrame({
"Mode Values": values,
"Confidences": confidences
})
print(classification_df)
return pd.Categorical(values=values, categories=cat_array.cat.categories), np.array(confidences)
def map_labels(self, labels, method, confidence_threshold: float = 0.5):
"""\
Map labels of `adata` to `adata_new`.
This function infers `labels` for `adata_new.obs`
from existing labels in `adata.obs`.
`method` can be only 'knn'.
"""
if method == "knn":
classified_labels, confidences = self._knn_classify(labels)
mask = confidences >= confidence_threshold
filtered_labels = [
label if mask[idx] else np.nan
for idx, label in enumerate(classified_labels)
]
classified_labels = pd.Categorical(
filtered_labels,
categories=classified_labels.categories
)
self._adata_new.obs[labels] = classified_labels
self._adata_new.obs[labels + '_confidence'] = confidences
else:
raise NotImplementedError("Ingest supports knn labeling for now.")
would love to get input on whether or not this makes sense
Hi! I don’t know if it makes sense statistically, but having a metric like this would be nice.
@Koncopd could you please take a look?
What kind of feature would you like to request?
Additional function parameters / changed functionality / changed defaults?
Please describe your wishes
First thank you for the amazing tool you guys developed
I am currently trying to use the ingest function to map cluster identities from a single cell dataset to a spatial data set. the single cell is composed from a subset of cell types while the spatial has all of the cells. The single cell has many genes while the spatial has a few. I parse out the common genes from from both datasets and use those to run the ingest function which works quiet well!
The issue I am running into is that the ingest function forces an identity onto cells even though the confidence of that identity is probably very low.
I am getting around that, in part, by subsetting the spatial dataset, but it would be terrific if I could use a confidence parameter to specify which cells would get an identity at all.
I was playing around with the ingest scripts and was thinking of something like this
perhaps I am misunderstanding the tool, or unaware of another tool which exists for my purpose, and would love input and help