Closed jeromeku closed 1 year ago
Did some additional debugging.
Something odd is going on during the call to ppo_trainer.generate.
Tracing this call ultimately leads to this line of the underlying LlamaModel, which tries to embed the input_ids
.
I checked that the input_ids
are all within the min and max embedding dimensions (= 32000, i.e., the vocab size). Setting a breakpoint here and passing in each input_id
individually, the first embedding call works, but subsequent calls result in the CUDA error per the previous post (CUDA device-side assert
error). Also the order in which the inputs are embedded doesn't seem to matter -- the CUDA error is triggered for subsequent embedding calls after the first embedding call.
Hi @jeromeku Thanks for the issue, this seems to be a duplicate of #313 - I think it is related to the tokenizer as you might be using the wrong tokenizer. Can you double check the tokenizer you are using?
Thanks @younesbelkada. Tried using LlamaTokenizer
instead of AutoTokenizer
per #313 but still getting the same issue, probably since AutoTokenizer
is just calling LlamaTokenizer
under the hood. I uploaded the tokenizer that I'm using here. It's the resulting tokenizer from running the Llama-to-hf conversion script provided here.
Hi @younesbelkada:
Tried a few more things:
tokenizer
to hf-internal-testing/llama-tokenizertokenizer
, training still fails when using AutoTokenizer
LlamaTokenizer.from_pretrained
directly seems to fix the problem.Very confused as the AutoTokenizer.from_pretrained
method seems to be doing a look up of available pretrained tokenizer classes and calls the pretrained
method of the identified class. On the main
branch of transformers
, this just locates LlamaTokenizer
in the tokenizer config mapping.
Any ideas what could be leading to this discrepancy?
Another question: why is the eos_token_id
set to 100_000 in the generation_kwargs
?
Thanks for all the input! Can you point me to the llama checkpoint that you are using on the Hub?
From what I have got using LlamaTokenizer.from_pretrained("hf-internal-testing/llama-tokenizer")
seems to be the fix?
Regarding your second question, this is done so that generate
will continue generating text infinitely even if the model has predicted the true eos_token
. Note that in general setting manually eos_token_id
to the generation_kwargs
will force generate
to stop generating whenever it sees that token. Hence, putting a very abstract value will force the function to continue generating infinitely
Hi @younesbelkada,
Here are the models I've been using to test the rl_training.py
script:
llama_to_hf conversion
script on Llama 7B weights and 2) running the merge_peft_adapter
script on the base model to merge with trl-lib SFT-tuned adapter weights.When running the rl_training
script, I used all default arguments:
AutoTokenizer
to load the tokenizerAutoTokenizer
to LlamaTokenizer
at this line "fixes" the issue, as in the script is able to run.Unresolved issue, I guess, is why changing from AutoTokenizer
to LlamaTokenizer
seems to avert the error. My understanding of AutoTokenizer
(as described in earlier post) is that it simply looks up the tokenizer
to initialize based on the passed in model path, which then looks up the tokenizer
type based on the config
, which should resolve to LlamaTokenizer
.
Good point, I believe something is wrong between LlamaTokenizer
and LlamaTokenizerFast
, as AutoTokenizer
retrieves the LlamaTokenizerFast
object:
>>> from transformers import AutoTokenizer, LlamaTokenizer
>>> auto_tok = AutoTokenizer.from_pretrained("hf-internal-testing/llama-tokenizer")
Downloading tokenizer.model: 100%|████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 10.7MB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 27.1MB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 296kB/s]
>>> non_auto_tok = LlamaTokenizer.from_pretrained("hf-internal-testing/llama-tokenizer")
>>> auto_tok
LlamaTokenizerFast(name_or_path='hf-internal-testing/llama-tokenizer', vocab_size=32000, model_max_length=2048, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=True)}, clean_up_tokenization_spaces=False)
>>> non_auto_tok
LlamaTokenizer(name_or_path='hf-internal-testing/llama-tokenizer', vocab_size=32000, model_max_length=2048, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=True)}, clean_up_tokenization_spaces=False)
>>>
I believe the main issue is resolved right? (i.e.training script works), for the other issue would you mind opening a ticked in transformers
describing that the behavior of LlamaTokenizer
and LlamaTokenizerFast
might be inconsistent (ideally with a reproducible script)?
Closing the issue here, feel free to re-open it if you think that the main issue is not addressed
Trying to run stack-llama rl_training script with the following reward model and this SFT model.
The SFT model was created from a base Llama 7B model (weights converted using llama convert script) merged with SFT adapter weights using the merge script.
Tokenizer is that from the converted Llama 7B model.
Running the
rl_training.py
script with default parameters and with reward model, SFT model, and tokenizer per above usingaccelerate launch
results in a CUDA device-side assertion error:Notes:
supervised_fine_tuning.py
andreward_modeling.py
scripts and was able to run both successfully.rl_training.py
script but always get the above error.Here is my environment:
Device is:
NVIDIA A100-SXM4-80GB
Any ideas what might be causing this?