Hi
I am trying to generate negations out of non-negated sentences.
I used a simple “I have tea” => “I don’t have tea” formatted dataset for training an XLMR encoder-decoder model using the example provided in the collab.
# set special tokens
roberta_shared.config.decoder_start_token_id = tokenizer.bos_token_id
roberta_shared.config.eos_token_id = tokenizer.eos_token_id
# sensible parameters for beam search
# set decoding params
roberta_shared.config.max_length = 64
roberta_shared.config.early_stopping = True
roberta_shared.config.no_repeat_ngram_size = 3
roberta_shared.config.length_penalty = 2.0
roberta_shared.config.num_beams = 4
roberta_shared.config.vocab_size = roberta_shared.config.encoder.vocab_size
But the test set produces different tokens than the source. How can I preserve the source tokens when generating the output.
[“I have it.”, “I love tea”, “I can have coffee.”] =>
[‘I have no it.’, “I’ll not love.” “I can’t have food.”]
Where the model modifies the words in the sentence.
Hi I am trying to generate negations out of non-negated sentences. I used a simple “I have tea” => “I don’t have tea” formatted dataset for training an XLMR encoder-decoder model using the example provided in the collab.
But the test set produces different tokens than the source. How can I preserve the source tokens when generating the output. [“I have it.”, “I love tea”, “I can have coffee.”] => [‘I have no it.’, “I’ll not love.” “I can’t have food.”] Where the model modifies the words in the sentence.