Open sashank06 opened 5 years ago
Hi @sashank06, I am also interested in returning the attention weights as in the models from Pytorch transformers, so I've been exploring it a little bit. With the current options of Fast-bert I think it's not possible. However, there is a simple workaround.
According to the Huggingface Transformers docs
# Models can return full list of hidden-states & attentions weights at each layer
model = model_class.from_pretrained(pretrained_weights,
output_hidden_states=True,
output_attentions=True)
Moreover, according to the Huggingface Transformer models docstrings:
**attentions**: (`optional`, returned when ``config.output_attentions=True``)
But in Fast-bert, we have that logic in the learner_cls.py
file, and unfortunately, it's not parametrized.
config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels))
[...]
if multi_label == True:
model = model_class[1].from_pretrained(pretrained_path, config=config, state_dict=model_state_dict)
else:
model = model_class[0].from_pretrained(pretrained_path, config=config, state_dict=model_state_dict)
The workaround it's to manually add the output_attentions
parameter to the config
class. So we have to replace:
config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels))
with:
config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels), output_attentions=True)
After that, the output of the predict_batch
function (for example) contains the attention weights.
Note: I can do a pull-request with this option parametrized if @kaushaltrivedi want me to do it and is willing to merge it. Hope this helps.
Thanks. Please create the pull request. Happy to merge it.
I am assuming you only need attention weights during inference time.
Yes, that's it.
@alberduris I have been doing the same using outpt_attentions= True. It would be a great feature to be integrated with fast-bert.
I updated the load_model
method in learner_cls.py
by adding output_attentions=True
to the from_pretrained
methods but after loading my model by
model = BertLearner.from_pretrained_model( databunch, pretrained_path='model_out', metrics=[{'name': 'accuracy', 'function': accuracy}], device=torch.device("cuda"), logger=logging.getLogger(), output_dir='output')
the predict_batch
method does still not return any attention weights.
What am I missing?
Sorry but I haven't been hacking this library for a time now so I am a bit outdated. Check the Transformers repo, see how they handle the output of attention, track the function calls and the parameters needed and try to check if everything is the same in this library. Anyway, maybe @kaushaltrivedi can help you.
Very appreciated if you post it here in case you manage to solve it :rocket:
Does this return the attention weights that is possible to obtain from the BERT model through PyTorch transformers?