Poor performance observed with models trained using the Training notebook

shyamsn97 / mario-gpt

[Neurips 2023] Generating Mario Levels with GPT2. Code for the paper "MarioGPT: Open-Ended Text2Level Generation through Large Language Models" https://arxiv.org/abs/2302.05981

https://huggingface.co/shyamsn97/Mario-GPT2-700-context-length

MIT License

1.11k stars 101 forks source link

Poor performance observed with models trained using the Training notebook #25

Closed chenxd1996 closed 5 months ago

chenxd1996 commented 1 year ago

Hello,

I have been using the Training notebook provided in this repository to train my model, and I've encountered an issue where the performance of the trained model is significantly subpar.

shyamsn97 commented 1 year ago

Hey! Can you share your training parameters? Like number of epochs, learning rate, etc. I think by default the notebook doesn’t train for long, so that could be an issue

chenxd1996 commented 1 year ago

Hey! Can you share your training parameters? Like number of epochs, learning rate, etc. I think by default the notebook doesn’t train for long, so that could be an issue

Oh, I see! I've been using the default training parameters. That might be the issue then. Could you please advise on the optimal settings for the training parameters for better performance?

shyamsn97 commented 1 year ago

In the original work I trained for 50k iterations with a batch size of 4 (so the model ends up seeing 200,000 samples). Although I think you can get away with a lot less

shyamsn97 commented 5 months ago

Closed this, feel free to open it if you have more questions!