utterworks / fast-bert

Super easy library for BERT based NLP models
Apache License 2.0
1.86k stars 341 forks source link

Attention Weights #38

Open sashank06 opened 5 years ago

sashank06 commented 5 years ago

Does this return the attention weights that is possible to obtain from the BERT model through PyTorch transformers?

alberduris commented 5 years ago

Hi @sashank06, I am also interested in returning the attention weights as in the models from Pytorch transformers, so I've been exploring it a little bit. With the current options of Fast-bert I think it's not possible. However, there is a simple workaround.

According to the Huggingface Transformers docs

# Models can return full list of hidden-states & attentions weights at each layer
model = model_class.from_pretrained(pretrained_weights, 
                                    output_hidden_states=True, 
                                    output_attentions=True)

Moreover, according to the Huggingface Transformer models docstrings:

**attentions**: (`optional`, returned when ``config.output_attentions=True``)

But in Fast-bert, we have that logic in the learner_cls.py file, and unfortunately, it's not parametrized.

config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels))

[...]

if multi_label == True:
    model = model_class[1].from_pretrained(pretrained_path, config=config, state_dict=model_state_dict)
else:
    model = model_class[0].from_pretrained(pretrained_path, config=config, state_dict=model_state_dict)

The workaround it's to manually add the output_attentions parameter to the config class. So we have to replace:

config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels))

with:

config = config_class.from_pretrained(pretrained_path, num_labels=len(dataBunch.labels), output_attentions=True)

After that, the output of the predict_batch function (for example) contains the attention weights.

Note: I can do a pull-request with this option parametrized if @kaushaltrivedi want me to do it and is willing to merge it. Hope this helps.

kaushaltrivedi commented 5 years ago

Thanks. Please create the pull request. Happy to merge it.

kaushaltrivedi commented 5 years ago

I am assuming you only need attention weights during inference time.

alberduris commented 5 years ago

Yes, that's it.

sashank06 commented 5 years ago

@alberduris I have been doing the same using outpt_attentions= True. It would be a great feature to be integrated with fast-bert.

fuuman commented 4 years ago

I updated the load_model method in learner_cls.py by adding output_attentions=True to the from_pretrained methods but after loading my model by model = BertLearner.from_pretrained_model( databunch, pretrained_path='model_out', metrics=[{'name': 'accuracy', 'function': accuracy}], device=torch.device("cuda"), logger=logging.getLogger(), output_dir='output') the predict_batch method does still not return any attention weights.

What am I missing?

alberduris commented 4 years ago

Sorry but I haven't been hacking this library for a time now so I am a bit outdated. Check the Transformers repo, see how they handle the output of attention, track the function calls and the parameters needed and try to check if everything is the same in this library. Anyway, maybe @kaushaltrivedi can help you.

Very appreciated if you post it here in case you manage to solve it :rocket: