Closed ctandrewtran closed 1 month ago
Closing with answer to help others-
max = np.argpartition(predict_proba_output, -wantedclassoutputs)[-wantedclassoutputs:]
sorted_max = max[np.argsort(-ans[max])] (descending)
Hmm- I am getting unreliable results- am I doing something wrong?
Per the above example- np argmax on the output of predict_proba should return the same as predict, but that is not happening.
If I understand your use case correctly, you might be looking for model.model_head.classes_
:
Ugly example below:
probas = model.predict_proba(docs)
proba_dict = [dict(zip(model.model_head.classes_,p.numpy())) for p in probas]
predict_report = pd.DataFrame(proba_dict)
Hmm, just ran it and it gave TypeError: iteration over a 0-d array. THank you for helping!
Edit- needed to convert it
proba_dict = dict(zip(model.modelhead.classes, probas.numpy().flatten()))
It works! And the top value returned aligns with what is returned from predict.
Code snippet to get top 25 predicted class values
probas = model.predict_proba(text)
proba_dict = dict(zip(model.model_head.classes_, probas.numpy().flatten()))
top_25_keys = [x[0] for x in sorted(proba_dict.items(), key=lambda x: x[1], reverse=True)[:25]]
Hello!
I am classifying based on ~1000 different classes
I am interested in using predict_proba, then grabbing the top N predicted classes.
Is there a way to achieve this easily using what is available with setfit?