Closed Elysium1436 closed 3 years ago
I've finally found this article, and it seems promising. Going to try it out i'll say how it went.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I have a relatively small dataset that i've scraped on my discord server. I wanted to make a gpt-2 chatbot with it, but the data is relatively small (3782031 characters counting the eos token). Training for a small number of epochs did nothing for any checkpoint related to gpt-2 (I tried distilbert, gpt-2, dialoGPT-small, and other), and training for a large number of epochs absolutely destroyed the whole model, it was barely able to generate coherent at all, it was either special characters, jumble, or nothing at all. I've tested the same script with a much larger dataset and it worked just fine, so I can only assume it's because of the dataset size. I was trying to find ways to freeze the gpt-2 base model and leave just the LMHead, but since the LMHead is somehow tied to the embedding layer, that wouldn't be possible... If there isn't a way to freeze the head of the model, what else should I do then? I've been trying to complete this personal project for quite a while now, and i'm out of options at this point. I'm using a custom TF script from the example folder on TPU, since the pytorch version makes the memory usage blow up on colab.