Closed Kosei1227 closed 3 weeks ago
To be frank, I haven't taken a deep look in to this issue.
I think you can get some insights as to what happened by checking the update logs of Transformers. But at the moment, I don't have the capacity to debug LangBridge in newer versions of Transformers :( If you manage to find the issue, please let me know though.
Thank you so much for your immediate response. I will follow the updated changes of transformers to fix the issue.
Just to confirm, was tansformer=4.37.2 the latest version when you designed LangBridge?
I would like to know why the authors used the version in this paper.
Yeah, I believe that was the latest at the time.
to be short, gemma2model did not return past caches, unlike other models. This issue has been resolved from transformer==4.5.0, but since the atypical attention structure(sliding + global). The hybrid caching system has a flaw and does not account for the encoder features we have beforehand. Thus, so far, I successfully modified the hybrid caching and LangBridge modeling. The inference generation runs without errors.
But, I face the issue that the model outputs almost the same answers every time which might be attributed to the
# Assume that there will always only be one token in the decoder inputs, assumption holds for existing HF models <- Really? i need to check
input_ids = torch.LongTensor([self.lm_tokenizer.bos_token_id])
input_ids = input_ids.repeat(enc_ids.shape[0], 1).to(self.device)
attention_mask = torch.ones_like(input_ids)
Could you tell me why you made this inference choice? It's unusual to input only one token to the model to my knowledge.
Edit: to my knowledge, LangBridge adds soft prompts to the inputs, but it seems we only pass the soft prompts from encoders. I might misunderstand the methodology.
Edit2: nvm, I inspected the input ids and it correctly takes the next input ids. hmm, there seems to be another reason why I see mostly identical responses. Did you observe the almost identical responses in your research using LangBridge?
Did you observe the almost identical responses in your research using LangBridge?
No, I haven't seen that behavior.
The one token I passed to the LM is supposed to be the BOS token. I understand that it looks confusing, but I suspected that the LM will behave wierdly without the explicit BOS token since it was always trained to have one in the beginning.
I see. So, to my current understanding, LangBridge enforces the encoder-decoder model style without tokenizing the input texts with lm tokenizer.
Even though each input ids after iteration will be tokenized with lm tokenizer, I suspect losing lm tokenized inputs can impact the performance.
Did you attempt the prompt tuning approach to create soft prompts from the encoder?
Yeah, that's a good point. If I understood correctly, here's a followup work to LangBridge that improves on what you said.
oh, I see. Thank you so much for sharing the paper. yeah, that's exactly what I want. I will try this model soon.
Hi, our team wants to test LangBridge on Gemma2 but as Gemma2 is not created for transformers==4.37.2, I created the LangBridge model on Gemma2 with the transformers==4.42.0. But the result is terrible. The model cannot even produce reasonable texts.
Also, we defined the custom langbridge model as seen below.
Could you tell us why the transformer version 4.37.2 is required for LangBridge?
Thank you