Closed svjack closed 1 year ago
In principle, there is nothing special about OPT. Using BLOOM should also work, as long as you update the model API calls (if they are different, which they might not be).
The BERT models were something used early in development. We didn't train any BERT-like models in the final version, so I don't have any config files for them, sorry.
In principle, there is nothing special about OPT. Using BLOOM should also work, as long as you update the model API calls (if they are different, which they might not be).
The BERT models were something used early in development. We didn't train any BERT-like models in the final version, so I don't have any config files for them, sorry.
Why Bloom tokenizer when use padding to max_length it will placed the padding tokens to the head ?
native_tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m",
use_fast=False)
caption = "a bear in the woods."
tokenized_data = native_tokenizer(
caption,
return_tensors="pt",
padding='max_length',
truncation=True,
max_length=56)
tokens = tokenized_data.input_ids[0]
tokens
will produce
tensor([ 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 68, 50507, 361, 368,
165526, 17])
It pad the pad_token_id "3" to the head, not tail. This is different with other models. Why this occurred ?
I've never used the BLOOM models before, so I don't know what this issue is, sorry. I think this is something you will have to check with the authors of that model.
After training, below code init ret embedding
with torch.no_grad():
model.model.input_embeddings.weight[model.model.retrieval_token_idx, :].copy_(checkpoint['state_dict']['ret_input_embeddings.weight'].cpu().detach())
Which Naming rules used to induce ret_input_embeddings in the network in the source code ?
You can produce ret_input_embeddings
by extracting the trained [RET] token embeddings as such:
state_dict['ret_input_embeddings.weight'] = state_dict['model.input_embeddings.weight'][args.retrieval_token_idx].clone()
The benefit of doing this is that we save space as we don't need to retain the frozen OPT embeddings, we just need to save the [RET] one.
If I want to replace the lm model in the project, do you prefer bigscience/bloom as a multilanguage replacement ? Or you have some other recommendations ? I want the model replaced can works in question-answer downstream works. And I'm interesting about why the loss you use not related with qa tasks, but the model can works in question-answer downstream works. Does this only use the few-shot ability of Facebook/opt ?
And i see you use "bert" as a option in if-else judgment block in models.py This mean you take "bert" as a replacement, Can you share a FrozenArgs configuration of "bert" model ?