Closed zanussbaum closed 2 years ago
The warning can be ignored. It's just HF not being aware of our models supporting the text-generation pipeline. Text generation still goes ahead without issue: https://colab.research.google.com/drive/1IGbolGIpafvp0vA7qbP2BvnAEIkSCDDG?usp=sharing
Though I agree that getting error messages when everything is in order is quite confusing, I'll ask the HF folks if there's a workaround.
Regarding prompt tuning we followed the same process as we do typically in NLP. Just take whatever dataset you want to finetune on (for us just a specific protein family), freeze all the model weights except for the embeddings of the prompt tuning tokens and then train normally. For our exps we used k=10
for the number of prompt tuning tokens. Unfortunately we do this in our codebase and I won't have the time to port this over to HF, tho I'm sure there's some nice tutorial with HF somewhere. One of the tricks to get PT working is that you need a large learning rate, typically a couple of order of magnitude higher than during normal training. But we didn't do anything special for proteins, just the normal process.
Hi @DanielHesslow, I got a KeyError: 'rita'
message when I run the same code as @zanussbaum did. Could you please help with it? Thanks
Unable to run a similar script to the example
This is with Python3.8 and HF Transformers
tokenizers-0.12.1 transformers-4.19.2
Additionally, are there more details about your prompt tuning? Curious to know how you approached it and what prompt engineering looks like for proteins as opposed to language.