I have tried many different configurations and many different versions, all with the same result. It continuously generates blank text. I don't know if this is a problem with aitextgen or what? I'm using the Google Colab page to train.
Pip installed versions (Simply trying !pip install aitextgen ends in a failure, this is the setup that I got to work with the most recent versions available):
The results I'm seeing during training: (Two out of the 5 test results are fine)
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
1,000 steps reached: generating sample texts.
==========
==========
Configuration saved in trained_model/generation_config.json
2,000 steps reached: saving model to /trained_model
Generate config GenerationConfig {
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "4.26.1"
}
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
2,000 steps reached: generating sample texts.
==========
yoday
==========
Configuration saved in trained_model/generation_config.json
3,000 steps reached: saving model to /trained_model
Generate config GenerationConfig {
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "4.26.1"
}
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
3,000 steps reached: generating sample texts.
==========
:gottem:
==========
Configuration saved in trained_model/generation_config.json
4,000 steps reached: saving model to /trained_model
Generate config GenerationConfig {
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "4.26.1"
}
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
4,000 steps reached: generating sample texts.
==========
==========
Configuration saved in trained_model/generation_config.json
5,000 steps reached: saving model to /trained_model
Generate config GenerationConfig {
"bos_token_id": 50256,
"eos_token_id": 50256,
"transformers_version": "4.26.1"
}
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
5,000 steps reached: generating sample texts.
==========
==========
This continues to happen when generating text from the trained model.
I'm pretty new to this sort of stuff, so is it something I'm doing wrong?
I have tried many different configurations and many different versions, all with the same result. It continuously generates blank text. I don't know if this is a problem with aitextgen or what? I'm using the Google Colab page to train.
Pip installed versions (Simply trying !pip install aitextgen ends in a failure, this is the setup that I got to work with the most recent versions available):
I have removed all non-ASCII characters from the training file, thinking that might be the issue before cleaning it - that didn't help.
The AI train settings (pretty much the default provided):
The results I'm seeing during training: (Two out of the 5 test results are fine)
This continues to happen when generating text from the trained model.
I'm pretty new to this sort of stuff, so is it something I'm doing wrong?