jaydebsarker commented 1 year ago

I am adding two lines here from : https://github.com/hila-chefer/Transformer-Explainability/blob/main/BERT_explainability.ipynb

i) expl = explanations.generate_LRP(input_ids=input_ids, attention_mask=attention_mask, start_layer=0)[0]

normalize scores

ii) expl = (expl - expl.min()) / (expl.max() - expl.min())

You normalized the explanation vector score in line ii) here. After the normalization, the most significant one/more got a score of 1.0./-1.0. Other than that, some other tokens may get 0.7 or 0.6, etc. In that case, which tokens are considered as predicted class (i.e., negative sentiment) from the model output? To be specific, did you put a threshold (i.e., >=0.5) for each token to align a specific class?

I was wondering if you would clarify this for me.

hila-chefer commented 1 year ago

Hi @jaydebsarker, thanks for your interest! I apologize for the delay in my response. The normalization sets the values between 0 and 1 for visualization purposes. I do not set a threshold, but if you wish to set one, I recommend using Otsu’s method for that (as done in our second paper).

Best, Hila.

jaydebsarker commented 1 year ago

Hi @hila-chefer ,

Thank you so much for the reference for your paper.

Best, Jaydeb

hila-chefer / Transformer-Explainability

Threshold for Attention Vector during comparison #50

normalize scores