huggingface / swift-coreml-transformers

Swift Core ML 3 implementations of GPT-2, DistilGPT-2, BERT, and DistilBERT for Question answering. Other Transformers coming soon!
Apache License 2.0
1.61k stars 173 forks source link

GPT-2 low quality responses #38

Open ljaniszewski00 opened 6 months ago

ljaniszewski00 commented 6 months ago

I'm trying to develop an iOS app which utilizes your distilgpt2-64-6.mlmodel but getting strange answers to my questions. I configured the model the same as you in attached ViewController: strategy: .topK(40) and nTokens: 50. I'm attaching some screenshots that show my conversation with the model (question is at the top (You) and answer from model (Device) is right below). What can be the cause of such behaviour?

IMG_1538 IMG_1537

pcuenca commented 6 months ago

Hi @ljaniszewski00! GPT2 is just a language model, and hasn't been trained to sustain chat conversations. It's trained to continue a text sequence with plausible text that may come after the prompt, and this task does not usually lend well to question answering. For example, instead of "What is the result of 2+2" you could potentially get better results with "2+2 is " (haven't tested it).

This project is currently in maintenance mode, I'd recommend you take a look at swift-transformers instead. That project uses the latest features in Core ML, which should give you better performance, and provides more tokenizers and tools. In addition, we are internally working on some exciting optimization features for language models.

ljaniszewski00 commented 6 months ago

@pcuenca Thanks for a response. This explains a lot. However as can be seen in the first screenshot I performed the same query as in the demo in readme of this repository but the output is drastically different.

My second question is - do you have any .mlmodel that is especially created for chatting on various topics?