salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BSD 3-Clause "New" or "Revised" License
4.86k stars 648 forks source link

Got 10 same words in respons. #143

Open zss977-web opened 1 year ago

zss977-web commented 1 year ago

Hi, I read your work and that helped me a lot . I fineturning BLIP Decoder in my own job Visdial dialog, when I use model to generate,I got Max_length same words in all respons.


          ans_in = batch["ans_in"]
          question_states = enc_out.unsqueeze(1).repeat(1,ans_in.size(-1),1)  # (batch_size, sequence_length, hidden_size)`
          question_atts = torch.ones(question_states.size()[:-1], dtype=torch.long).to(question_states.device)
          model_kwargs = {"encoder_hidden_states": question_states, "encoder_attention_mask": question_atts}

          bos_ids = torch.full((enc_out.size(0), 1), fill_value=1, device=enc_out.device)

          outputs = self.text_decoder.generate(input_ids=bos_ids,
                                               max_length=10,
                                               min_length=1,
                                               num_beams=num_beams,
                                               # eos_token_id=self.tokenizer.sep_token_id,
                                               # pad_token_id=self.tokenizer.pad_token_id,
                                               eos_token_id=2,
                                               pad_token_id=0,
                                               **model_kwargs)

Some results: ['', '7', '7', '7', '7', '7', '7', '7', '7', '7'] ['', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red', 'red'] ['', 'about', 'about', 'about', 'about', 'about', 'about', 'about', 'about', 'about'] Any guidance and suggestions can be helpful for me. Thanks.

other-ones commented 1 year ago

Hi, have you solved this issue? I'm facing the same thing