Open swcrazyfan opened 3 years ago
That's a weird notification. There may be a bug, although it shouldn't affect the final training.
How is your dataset structured?
To be honest, I'm pretty new to ML, so I'm not sure if I structured the data correctly or even how to tell you the way it's structured.
It's the text of a book. Right now, it's basically just pure text without empty lines. Roughly each paragraph or chapter title is it's own line.
Do you know of a good place to learn the basics of data preprocessing? Most things I've found seem to assume more knowledge than I currently have, but I'm trying to learn fast haha.
I'm getting this notification as well. I have extremely long stretches of text between newlines in my dataset, so maybe that's it. In any case, it doesn't seem to be having trouble as far as I can tell. (edit: that is, didn't seem to be having trouble, as of last week; training of GPT-Neo in Colab currently seems to be broken as of 06-May-2021)
That's a weird notification. There may be a bug, although it shouldn't affect the final training.
How is your dataset structured?
As you said, it doesn't seem to affect the results. Thank you!
I'm trying to train a model based on GPT Neo 125M, and I keep getting this error. It continues to train and even create text, but I'm pretty sure this will affect my final model. Is there a way I should prepare the data or a setting I should change?
Currently, I'm using text that was exported from a PDF. I did some basic preprocessing, but I'm not sure if it was enough.