Open Adaickalavan opened 6 months ago
Hi @Adaickalavan, thank you for your interest.
TransformerModelArguments
is just a wrapper for the Hugging Face names/paths for model, tokenizer and config. Some models work out of the box, others need adaptations. I cannot cover this 100% since the transformers library does not impose too much restrictions on the different models, and the newest one can always deviate from this.I briefly tried a smaller Llama model (1B):
<...>
File [/path/to/site-packages/transformers/models/llama/modeling_llama.py#line=1371), in LlamaForSequenceClassification.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
1369 batch_size = inputs_embeds.shape[0]
1371 if self.config.pad_token_id is None and batch_size != 1:
-> 1372 raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
1373 if self.config.pad_token_id is None:
1374 sequence_lengths = -1
ValueError: Cannot handle batch sizes > 1 if no padding token is defined.
The error seems to be known but the workaround is difficult to achieve with the current API. I will keep this in mind for v2.0.0
, but for now I would recommend just copying or subclassing TransformerBasedClassification
and adapting it until it fits your needs.
Classifier
implementation to do that. Somewhere down my list of ideas I have a PyTorch Lightning integration which can help with distributed training, however, I think for Llama 2 you will still need other repos as well.
Refering to the active learning for text classification example given here.
In the given example, we have:
In my case, I would like to use the language model meta-llama/Llama-2-7b-chat-hf as a sequence classifier by calling it as
Then, I would like to perform supervised training with active learning of the Llama sequence-classifer transformer model on the dataset Birchlabs/openai-prm800k-stepwise-critic.
Questions:
1) How do I modify the example in the repository to get a
clf_factory
which uses the abovebase_model
instead of providingTransformerModelArguments
?2) How do I use
small-text
to handle the large model size of Llama and potentially distribute its training over multiple GPUs?