kgourgou / setfit-integrated-gradients

Hacking SetFit so that it works with integrated gradients.
MIT License
2 stars 2 forks source link

multiclass classification #5

Closed dawei0716 closed 5 months ago

dawei0716 commented 1 year ago

Hi,

I know this only supports binary classification, but if there's a quick solution, would you be able to show how I can change it to support multiclass? I'm just looking to get attributions for the top class.

kgourgou commented 1 year ago

Hello,

That's an excellent question. You would first need a trained torch head and torch body for the multiclass problem, I think the current version of SetFit has that.

I don't know how quick or painless it would be, but if you instantiate the SetFitGrad class

https://github.com/kgourgou/setfit-integrated-gradients/blob/099bedadf562fe60ba684a1aa3af7f0122b86c93/setfit_ig/setfit_extensions.py#L45

with a model_head and model_body from a SetFit model supporting multi-class and then change this line

https://github.com/kgourgou/setfit-integrated-gradients/blob/099bedadf562fe60ba684a1aa3af7f0122b86c93/setfit_ig/setfit_extensions.py#L130

to return the probability of the majority class instead of the positive probability of the binary class, I think the rest should work as they are. I may try this out once I have some time.

dawei0716 commented 12 months ago

Thanks for the response! One more question: when I try changing the pre-trained language model, say from sentence-transformers/all-MiniLM-L6-v2 to all-roberta-large-v1, I get an index error in integrated_gradients_on_text function. I'm having trouble debugging. Any clue why?

for word, token_ids in word_to_ids:
   key_scores = scores.loc[token_ids, "attribution_score"].sum()
   word_to_score.append((word, key_scores))

KeyError: "None of [Int64Index([2050, 27345], dtype='int64', name='token_ids')] are in the [index]"

kgourgou commented 12 months ago

My best bet is that I have a bug that has to do with returning all of the token-ids. 😅

I'll try to reproduce this later and see if I can fix.

kgourgou commented 12 months ago

Update: figured out the bug, but need to think a bit about how to put the sentence back together; still a bit hacky, but you can get a feeling for the current issue if you run the demo.ipynb notebook in the roberta_fix branch.

You will see that attributions come back, but because I remove a special symbol by hand, there is spacing between subtokens of the same word. Should be easy to fix.

The bug was indeed about differences in tokenization between the two sentence-transformers.

kgourgou commented 11 months ago

@dawei0716 Do let me know if this is sufficient for you, or if you would like more help. 👍

dawei0716 commented 11 months ago

@kgourgou Hi. I really appreciate you helping me with this! It took me some time to correctly set up to use SetFitHead and perform multiclass classification, but now I am at the same issue as you where the tokens are broken.

I'm not really sure how to fix the issue because Im not sure in which step the sub-tokens are supposed to be concatenated but not concatenating properly (or whatever the normal behavior should be). For instance, im not sure if construct_word_to_id_mapping should be returning full tokens and their ids, as opposed to broken tokens. I assume the tokens should full words before attribution scores are calculated? Seems like the special tokens indicate start of the token (except first word) so they can be used to concatenate. Would appreciate your help/insight! I'm not an expert in this as you can tell 😅. Thank you!

kgourgou commented 11 months ago

Yeah, that is all fair, I'll see if I can finish the fix later. :)

dawei0716 commented 11 months ago

Thanks for your help on this!

dawei0716 commented 11 months ago

Hey @kgourgou. Hope all is well. If you are busy, would you be able to direct me on where and what fixes need to be made and I can try to work on it? Thanks!

kgourgou commented 11 months ago

Hello! Apologies, neurips rebuttal period is affecting my time schedule. :D

Yeah, I'll mark it for you now.

kgourgou commented 11 months ago

So, this is the place I altered:

https://github.com/kgourgou/setfit-integrated-gradients/blob/5786d374dc2aa68e3035a3aee1b7b9f1bb3cf4be/setfit_ig/html_text_colorizer.py#L73

This df_w2s is a dataframe that is supposed to contain all words in the sentence and their scores from integrated gradients. Then at the end I basically do a ' '.join() to get back the original sentence.

However, different tokenizers can use different symbols to separate words into subtokens. If you drop a breakpoint above that line and then print the frame you will see what I mean.

What we want to do there is not necessarily to replace the "\u0120" but as you said above to concatenate the subtokens correctly as needed. Normally this is done with "tokenizer.decode", but here I wanted to add colours to the words, hence I did the splitting and joining by hand.

As an approximation we can assume that the attribution of a word is the max (or sum?) of the attributions of the subtokens.