Open tonifergue opened 3 years ago
Max, thanks so much for your work on the library -- it rocks!
I'm having the same issue as @tonifergue . I have a custom dataset that I'm training from scratch. ~5,000,000 lines, with vocab size of 2000. Training in Colab Pro
ai.train(file_name, line_by_line=True, from_cache=False, num_steps=5000, generate_every=1000, save_every=1000, save_gdrive=True, learning_rate=1e-3, batch_size=64, num_workers=1 )
but crashing with OOM, when it's encoding the data. Any advise?
thanks again
Hi Max, firstly cogratulations for your incredible library.
I use aitextgen for small texts and works fine, but when I tried to train a 1gb file on google colab, the system crash without any error, just restart the system.
In google colab gpu and ram are enabled.
the file is a txt file in UTF8 and all the sentences are by lines.
Following my configuration: ai.train(file_name, line_by_line=True, from_cache=False, num_steps=5000, generate_every=250, save_every=1000, save_gdrive=False, learning_rate=1e-3, batch_size=256, num_workers=32,
Please, can you recommend me other configuration to train this type of files? Can we do that in google colab? I Thanks in advance