Open nashcaps2255 opened 1 year ago
I think you could try adding your special token code to encode/decode, and then finetune it
Where would I add that, the train.py? I added <|summarize|> to the sample.py file for 'allowed_special'.
At a high level, I'm confused why the GPT is treating it exactly the same as a end of text token. You would think if there is something wrong with the encoding, it would instead treat it as a normal token, not exactly the same as the end of text token?
Should each example have padding? Maybe that is the issue?
Where would I add that, the train.py? I added <|summarize|> to the sample.py file for 'allowed_special'.
At a high level, I'm confused why the GPT is treating it exactly the same as a end of text token. You would think if there is something wrong with the encoding, it would instead treat it as a normal token, not exactly the same as the end of text token?
Do you add your special_tokens before finetune? you could add it in train.py after load 'meta.pkl'
No I didn't, I will try this.
I was under, perhaps the false, impression that with enough examples the GPT would 'learn' the token.
@nashcaps2255
You are not under a false impression, You Probably can do the above task, but it depends a lot on your model, hyperparameters, and dataset size. I have a Conversational Model Based off NanoGPT, with the same model as you, gpt2-medium, and I use the tokens <human>
<bot>
and <endOfText>
the model doesn't really treat one token like another, although my tokens probably show up way more than your tokens. definitely try increasing and maybe try using GPT2-XL. Its most likely a problem with dataset size, when I first started training on a small dataset, I had the same issue, but as I increased the dataset size, the problem disappeared
@nashcaps2255 I hope my message finds you well and safe. I am approaching very similar application to what you were trying to do but I am facing some problems. Have you been able to solve your problems and fine tune it for summarization? Thank you
Hey, different account same person :)
But yes I was, the issue was with my dataset, the data I was using wasn't enough of a 1-to-1 summary, once I added outside data (i.e. a large summary dataset) to the model finetuning, it was able to quite effectively learn the task.
So essentially exactly what @VatsaDev said was correct, dataset size issue
Hi all, I trained a 355M~ parameter GPT from scratch, the model performs quite well with text entailment.
I finetuned the model to perform text summarization, structuring the data like so...
Text(s) to summarize <|summarize|> summary <|endoftext|>
It is my understanding that this finetuned model, at inference, when faced with text and the summarization special token, should, at least somewhat, understand the task. However, the model seems to be treating the summarization token as an end of text token. Meaning that, when keeping the seed and other parameters constant, the model will generate the same exact 'summary', no matter what the text before the <|summarize|> is.