关于beam size的问题

是transformers的模型自带的类方法，参数有 @torch.no_grad() def generate( self, input_ids: Optional[torch.LongTensor] = None, max_length: Optional[int] = None, min_length: Optional[int] = None, do_sample: Optional[bool] = None, early_stopping: Optional[bool] = None, num_beams: Optional[int] = None, temperature: Optional[float] = None, top_k: Optional[int] = None, top_p: Optional[float] = None, repetition_penalty: Optional[float] = None, bad_words_ids: Optional[Iterable[int]] = None, bos_token_id: Optional[int] = None, pad_token_id: Optional[int] = None, eos_token_id: Optional[int] = None, length_penalty: Optional[float] = None, no_repeat_ngram_size: Optional[int] = None, encoder_no_repeat_ngram_size: Optional[int] = None, num_return_sequences: Optional[int] = None, max_time: Optional[float] = None, decoder_start_token_id: Optional[int] = None, use_cache: Optional[bool] = None, num_beam_groups: Optional[int] = None, diversity_penalty: Optional[float] = None, prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor], List[int]]] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, output_scores: Optional[bool] = None, return_dict_in_generate: Optional[bool] = None, forced_bos_token_id: Optional[int] = None, forced_eos_token_id: Optional[int] = None, remove_invalid_values: Optional[bool] = None, synced_gpus: Optional[bool] = None, **model_kwargs, ) -> Union[GreedySearchOutput, SampleOutput, BeamSearchOutput, BeamSampleOutput, torch.LongTensor]: r""" Generates sequences for models with a language modeling head. The method currently supports greedy decoding, multinomial sampling, beam-search decoding, and beam-search multinomial sampling. Apart from :obj:input_idsand :obj:attention_mask, all the arguments below will default to the value of the attribute of the same name inside the :class:~transformers.PretrainedConfigof the model. The default values indicated are the default values of those config. Most of these parameters are explained in more detail inthis blog post https://huggingface.co/blog/how-to-generate__. Parameters: input_ids (:obj:torch.LongTensorof shape :obj:(batch_size, sequence_length),optional): The sequence used as a prompt for the generation. If :obj:Nonethe method initializes it as an empty :obj:torch.LongTensorof shape :obj:(1,). max_length (:obj:int,optional, defaults to 20): The maximum length of the sequence to be generated. min_length (:obj:int,optional, defaults to 10): The minimum length of the sequence to be generated. do_sample (:obj:bool,optional, defaults to :obj:False): Whether or not to use sampling ; use greedy decoding otherwise. early_stopping (:obj:bool,optional, defaults to :obj:False): Whether to stop the beam search when at least ``num_beams`` sentences are finished per batch or not. num_beams (:obj:int,optional, defaults to 1): Number of beams for beam search. 1 means no beam search. temperature (:obj:float,optional, defaults to 1.0): The value used to module the next token probabilities. top_k (:obj:int,optional, defaults to 50): The number of highest probability vocabulary tokens to keep for top-k-filtering. top_p (:obj:float,optional, defaults to 1.0): If set to float < 1, only the most probable tokens with probabilities that add up to :obj:top_por higher are kept for generation. repetition_penalty (:obj:float,optional, defaults to 1.0): The parameter for repetition penalty. 1.0 means no penalty. Seethis paper https://arxiv.org/pdf/1909.05858.pdf__ for more details. pad_token_id (:obj:int,optional): The id of thepaddingtoken. bos_token_id (:obj:int,optional): The id of thebeginning-of-sequencetoken. eos_token_id (:obj:int,optional): The id of theend-of-sequencetoken. length_penalty (:obj:float,optional, defaults to 1.0): Exponential penalty to the length. 1.0 means no penalty. Set to values < 1.0 in order to encourage the model to generate shorter sequences, to a value > 1.0 in order to encourage the model to produce longer sequences. no_repeat_ngram_size (:obj:int,optional, defaults to 0): If set to int > 0, all ngrams of that size can only occur once. encoder_no_repeat_ngram_size (:obj:int,optional, defaults to 0): If set to int > 0, all ngrams of that size that occur in the ``encoder_input_ids`` cannot occur in the ``decoder_input_ids``. bad_words_ids(:obj:List[List[int]],optional): List of token ids that are not allowed to be generated. In order to get the tokens of the words that should not appear in the generated text, use :obj:tokenizer(bad_word, add_prefix_space=True).input_ids. num_return_sequences(:obj:int,optional, defaults to 1): The number of independently computed returned sequences for each element in the batch. max_time(:obj:float,optional, defaults to None): The maximum amount of time you allow the computation to run for in seconds. generation will still finish the current pass after allocated time has been passed. attention_mask (:obj:torch.LongTensorof shape :obj:(batch_size, sequence_length),optional): Mask to avoid performing attention on padding token indices. Mask values are in ``[0, 1]``, 1 for tokens that are not masked, and 0 for masked tokens. If not provided, will default to a tensor the same shape as :obj:input_idsthat masks the pad token.What are attention masks? <../glossary.html#attention-mask>__ decoder_start_token_id (:obj:int,optional): If an encoder-decoder model starts decoding with a different token thanbos, the id of that token. use_cache: (:obj:bool,optional, defaults to :obj:True): Whether or not the model should use the past last key/values attentions (if applicable to the model) to speed up decoding. num_beam_groups (:obj:int,optional, defaults to 1): Number of groups to divide :obj:num_beamsinto in order to ensure diversity among different groups of beams.this paper https://arxiv.org/pdf/1610.02424.pdf__ for more details. diversity_penalty (:obj:float,optional, defaults to 0.0): This value is subtracted from a beam's score if it generates a token same as any beam from other group at a particular time. Note that :obj:diversity_penaltyis only effective if ``group beam search`` is enabled. prefix_allowed_tokens_fn: (:obj:Callable[[int, torch.Tensor], List[int]],optional): If provided, this function constraints the beam search to allowed tokens only at each step. If not provided no constraint is applied. This function takes 2 arguments: the batch ID :obj:batch_idand :obj:input_ids. It has to return a list with the allowed tokens for the next generation step conditioned on the batch ID :obj:batch_idand the previously generated tokens :obj:inputs_ids. This argument is useful for constrained generation conditioned on the prefix, as described in Autoregressive Entity Retrieval https://arxiv.org/abs/2010.00904__. output_attentions (:obj:bool,optional, defaults toFalse): Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned tensors for more details. output_hidden_states (:obj:bool,optional, defaults toFalse): Whether or not to return trhe hidden states of all layers. See ``hidden_states`` under returned tensors for more details. output_scores (:obj:bool,optional, defaults toFalse): Whether or not to return the prediction scores. See ``scores`` under returned tensors for more details. return_dict_in_generate (:obj:bool,optional, defaults toFalse): Whether or not to return a :class:~transformers.file_utils.ModelOutputinstead of a plain tuple. forced_bos_token_id (:obj:int,optional): The id of the token to force as the first generated token after the :obj:decoder_start_token_id. Useful for multilingual models like :doc:mBART <../model_doc/mbart>where the first generated token needs to be the target language token. forced_eos_token_id (:obj:int,optional): The id of the token to force as the last generated token when :obj:max_lengthis reached. remove_invalid_values (:obj:bool,optional): Whether to remove possiblenanandinfoutputs of the model to prevent the generation method to crash. Note that using ``remove_invalid_values`` can slow down generation. synced_gpus (:obj:bool,optional, defaults to :obj:False): Whether to continue running the while loop until max_length (needed for ZeRO stage 3) model_kwargs: Additional model specific kwargs will be forwarded to the :obj:forwardfunction of the model. If the model is an encoder-decoder model, encoder specific kwargs should not be prefixed and decoder specific kwargs should be prefixed withdecoder_. Return: :class:~transformers.file_utils.ModelOutputor :obj:torch.LongTensor: A :class:~transformers.file_utils.ModelOutput(if ``return_dict_in_generate=True`` or when ``config.return_dict_in_generate=True``) or a :obj:torch.FloatTensor. If the model isnotan encoder-decoder model (``model.config.is_encoder_decoder=False``), the possible :class:~transformers.file_utils.ModelOutput` types are:

:class:~transformers.generation_utils.GreedySearchDecoderOnlyOutput,
:class:~transformers.generation_utils.SampleDecoderOnlyOutput,
:class:~transformers.generation_utils.BeamSearchDecoderOnlyOutput,
:class:~transformers.generation_utils.BeamSampleDecoderOnlyOutput If the model is an encoder-decoder model (model.config.is_encoder_decoder=True), the possible :class:~transformers.file_utils.ModelOutput types are:
:class:~transformers.generation_utils.GreedySearchEncoderDecoderOutput,
:class:~transformers.generation_utils.SampleEncoderDecoderOutput,
:class:~transformers.generation_utils.BeamSearchEncoderDecoderOutput,
:class:~transformers.generation_utils.BeamSampleEncoderDecoderOutput Examples::

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("distilgpt2") model = AutoModelForCausalLM.from_pretrained("distilgpt2")

do greedy decoding without providing a prompt

outputs = model.generate(max_length=40) print("Generated:", tokenizer.decode(outputs[0], skip_special_tokens=True)) tokenizer = AutoTokenizer.from_pretrained("t5-base") model = AutoModelForSeq2SeqLM.from_pretrained("t5-base") document = ( ... "at least two people were killed in a suspected bomb attack on a passenger bus " ... "in the strife-torn southern philippines on monday , the military said." ... )

encode input context

input_ids = tokenizer(document, return_tensors="pt").input_ids

generate 3 independent sequences using beam search decoding (5 beams)

with T5 encoder-decoder model conditioned on short news article.

outputs = model.generate(input_ids=input_ids, num_beams=5, num_return_sequences=3) print("Generated:", tokenizer.batch_decode(outputs, skip_special_tokens=True)) tokenizer = AutoTokenizer.from_pretrained("distilgpt2") model = AutoModelForCausalLM.from_pretrained("distilgpt2") input_context = "The dog"

encode input context

input_ids = tokenizer(input_context, return_tensors="pt").input_ids

generate 3 candidates using sampling

outputs = model.generate(input_ids=input_ids, max_length=20, num_return_sequences=3, do_sample=True) print("Generated:", tokenizer.batch_decode(outputs, skip_special_tokens=True)) tokenizer = AutoTokenizer.from_pretrained("ctrl") model = AutoModelForCausalLM.from_pretrained("ctrl")

"Legal" is one of the control codes for ctrl

input_context = "Legal My neighbor is"

encode input context

input_ids = tokenizer(input_context, return_tensors="pt").input_ids outputs = model.generate(input_ids=input_ids, max_length=20, repetition_penalty=1.2) print("Generated:", tokenizer.decode(outputs[0], skip_special_tokens=True)) tokenizer = AutoTokenizer.from_pretrained("gpt2") model = AutoModelForCausalLM.from_pretrained("gpt2") input_context = "My cute dog"

get tokens of words that should not be generated

bad_words_ids = [tokenizer(bad_word, add_prefix_space=True).input_ids for bad_word in ["idiot", "stupid", "shut up"]]

encode input context

input_ids = tokenizer(input_context, return_tensors="pt").input_ids

generate sequences without allowing bad_words to be generated

outputs = model.generate(input_ids=input_ids, max_length=20, do_sample=True, bad_words_ids=bad_words_ids) print("Generated:", tokenizer.decode(outputs[0], skip_special_tokens=True)) """ `

具体见https://github.com/huggingface/transformers/blob/741d48f5c7bf0acdf9b40d0deb8560b997761f3a/src/transformers/generation_utils.py

renmada / t5-pegasus-pytorch

关于beam size的问题 #6

do greedy decoding without providing a prompt

encode input context

generate 3 independent sequences using beam search decoding (5 beams)

with T5 encoder-decoder model conditioned on short news article.

encode input context

generate 3 candidates using sampling

"Legal" is one of the control codes for ctrl

encode input context

get tokens of words that should not be generated

encode input context

generate sequences without allowing bad_words to be generated