ml-projects-kiel / OpenCampus-ApplicationofTransformers

2 stars 0 forks source link

Choose a fitting transformer model #7

Open JaNoNi opened 1 year ago

valewer commented 1 year ago

Our goal in this project in its core is text generation or completion based on prompts. Therefore we need a transformer model that is created through causal language modelling. This means the model is focused on predicting the most fitting next token after a sequence of tokens and therefore produce coherent sentences. The most obvious choice to choose is the general purpose generative model gpt2 (gpt2) as was used in the article that inspired this project. Transformer models with different architectures like BERT as an encoder model are not really suitable for our task. They are built through masked learning having bidirectional input (so tokens before and after the masked token). There are possibilities to use such a model for text generation by having only masks and then randomly filling in those masks without left to right order but it is hard to control the quality of such output (ai.stackexchange).

Alternatives to gpt2 could be the derivatives created by the open Source AI collective EleutherAI (gpt-j, gpt-neo).

A problem that remains is that those models were pretrained with english text so a finetuning with german tweets might not work well. Alternatives with german data are on Huggingface however

Also investigate this seq2seq model https://huggingface.co/t5-small

The topic of adapting such a model for specific prompt design is more difficult. The most basic approach is to do zero-shot or few-shot learning by providing a small batch of structured examples already in the prompt. However that is not really what we want. The more promising way seems to be to alter the training data for the finetuning of our tweepy model. Instead of just using the tweets of one specific person we could simply use all data and add extra tokens like before the respective tweets to condition the model on such tokens. It remains to be seen how well this works.

Interesting work with using gpt2 for text generation in https://towardsdatascience.com/how-to-fine-tune-gpt-2-so-you-can-generate-long-form-creative-writing-7a5ae1314a61

valewer commented 1 year ago

potentially use large gpt2 version: https://huggingface.co/gpt2-xl