malteos / finetune-evaluation-harness

MIT License
2 stars 0 forks source link

Some notes concerning Flair for fine tuning LM's #2

Closed akash418 closed 1 year ago

akash418 commented 2 years ago

This issue collects some findings obtained during the fine-tuning process for the german model for the classification and NER task.

malteos commented 2 years ago

1) This issues seems old. Wasn't this fixed yet? See https://github.com/flairNLP/flair/issues/37#issuecomment-621763176 Did you try to increase the batch size?

2) "CLS pooling" refers to using last hidden state of the [CLS] token as embedding for the whole sequence (BERT uses this [CLS] token but GPT not). For GPT you can take for example the mean over all last hidden states (mean pooling, but exclude padded tokens) or the embedding of the EOS token.

Generally, you need to distingush between document level tasks (like classification) where you need an embedding for the whole document and token level tasks (sequence tagging like NER) where you need embeddings for each token.

3) See in my notebook. You can set the tokenizer pad token "embeddings.tokenizer.pad_token = embeddings.tokenizer.eos_token"

PS: If you feel more confident in implementing everything in HF transformers, you could do this too. But imho flair should be more easy.