minimaxir / aitextgen

A robust Python tool for text-based AI training and generation using GPT-2.
https://docs.aitextgen.io
MIT License
1.84k stars 220 forks source link

Warmup steps #195

Open TheGullahanMaster opened 2 years ago

TheGullahanMaster commented 2 years ago

How long should i wait for the warmup steps to finish? If higher than one, the loss just never budges, even if the iteration number is higher than warmup_steps. Does it mean epochs or iterations?

TheGullahanMaster commented 2 years ago

Also BLOOM seems to work fine for unlimited generation, with only a few adjustments to remove the generation character limit, though i had to modify the "config" to be compatible with the BloomConfig from transformers

ashokgit commented 2 years ago

@TheGullahanMaster can you please share what "config" modifications did you do for BLOOM to get it to work?

TheGullahanMaster commented 2 years ago

In utils.py, i edited the "build_gpt2_config like this: "`def build_gpt2_config( vocab_size: int = 10000, bos_token_id: int = 0, eos_token_id: int = 0, max_length: int = 2048, dropout: float = 0.0, **kwargs ): """ Builds a custom GPT-2 config based on a given Transformers config, with a few more user-friendly aliases. """

return BloomConfig(
    vocab_size=vocab_size,
    n_ctx=max_length,
    resid_pdrop=dropout,
    embd_pdrop=dropout,
    attn_pdrop=dropout,
    summary_first_dropout=dropout,
    bos_token_id=bos_token_id,
    eos_token_id=eos_token_id,
    **kwargs,
)`
TheGullahanMaster commented 2 years ago

and commented out this in aitextgen.py ` if prompt: prompt_num_tokens = list(prompt_tensors["input_ids"].shape)[1]

assert prompt_num_tokens < model_max_length(

        #    self.model.config
        #), f"The prompt is too large for the model. ({prompt_num_tokens} tokens)"` for "unlimited" generation during inference