Closed weiqisun closed 4 days ago
@microsoft-github-policy-service agree
I have tested this with Llama-3-8b and this does not seem to stop the early stopping issues import mii client = mii.serve("meta-llama/Meta-Llama-3-8B-Instruct", tensor_parallel=2) response = client.generate(["what is the capital of france, only use one word", "Seattle is what in under 10 words"], max_new_tokens=512) print(response)
Hi @regybean, the instruction tuned version of the Llama-3-8B model follows a specific chat template. Try this prompt:
>>> prompt = '<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a polite chatbot who always give helpful responses<|eot_id|><|start_header_id|>user<|end_header_id|>\n\nwhat is the capital of france, only use one word"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'
>>> client.generate(prompt, max_new_tokens=512)
[Paris]
Without this PR it will generate:
>>> client.generate(prompt, max_new_tokens=64)
[Paris. (Note: I'll try to keep it concise, while still providing accurate information!) Would you like to know more about Paris or France? I'd be happy to help! 🙏) Neighbor).) \ **Note: There was no punctuation in the previous response because I was trying to stay within]
Hi @weiqisun I tried using your exact example and it did not work for me, I explicitly installed deepspeed-mii through your branch. I even tested the non-instruct version, which did not stop. The only thing that has worked so far is the following prompt. Perhaps there is an issue with the tokeniser, or using tensor_parallel=2 is causing issues?
client = mii.serve("NousResearch/Meta-Llama-3-8B", tensor_parallel=2) prompt= '"how can i bake a cake, only use 20 word. End your message with the stop token which is <|eot_id|>"' response = client.generate(prompt, max_new_tokens=512) print(response)
[edit] Llama changed their huggingface repo as of today which fixed the eos token issues, a redownload fixed the issues :)
[edit] Llama changed their huggingface repo as of today which fixed the eos token issues, a redownload fixed the issues :)
Cool, I'm glad to hear the issue was resolved :). Right, the instruction tuned version of the Llama 3 model uses a different eos token (<|eot_id|>
) than the original eos token specified in the pre-trained base model (<|end_of_text|>
). Either downloading the latest model or setting the eos_token_id
in tokenizer/model config can fix the eos token issue.
@mrwyattii @awan-10 Can you please look at this quick PR? Thanks
@awan-10 @lekurile Thanks for approving this PR! There is 1 unit test failed due to a model download error (see below for the detailed error log). This PR has nothing to do with the failed test or the model download. It came from a FS lock issue. Do you have any ideas of why this happens?
==================================== ERRORS ====================================
_ ERROR at setup of test_local_model_dir[True-nofail-text-generation-None-1-auto-facebook/opt-125m-True] _
model_name = 'facebook/opt-125m', local_model = True
tmpdir = local('/tmp/pytest-of-root/pytest-0/test_local_model_dir_True_nofa0')
@pytest.fixture(scope="function")
def model_path(model_name, local_model, tmpdir):
if not local_model:
return None
base_dir = os.getenv("HF_HOME", tmpdir)
download_dir = os.path.join(base_dir, "mii-ci-models", model_name)
> snapshot_download(model_name, local_dir=download_dir)
conftest.py:87:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/_snapshot_download.py:294: in snapshot_download
thread_map(
/usr/local/lib/python3.8/dist-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
/usr/local/lib/python3.8/dist-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
/usr/local/lib/python3.8/dist-packages/tqdm/std.py:11[78](https://github.com/microsoft/DeepSpeed-MII/actions/runs/9196653791/job/25391010079?pr=473#step:9:79): in __iter__
for obj in iterable:
/usr/lib/python3.8/concurrent/futures/_base.py:619: in result_iterator
yield fs.pop().result()
/usr/lib/python3.8/concurrent/futures/_base.py:444: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/lib/python3.8/concurrent/futures/thread.py:57: in run
result = self.fn(*self.args, **self.kwargs)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/_snapshot_download.py:268: in _inner_hf_hub_download
return hf_hub_download(
/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py:1202: in hf_hub_download
return _hf_hub_download_to_local_dir(
/usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py:1406: in _hf_hub_download_to_local_dir
paths = get_local_download_paths(local_dir=local_dir, filename=filename)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/_local_folder.py:138: in get_local_download_paths
metadata_path = _huggingface_dir(local_dir) / "download" / f"{sanitized_filename}.metadata"
/usr/local/lib/python3.8/dist-packages/huggingface_hub/_local_folder.py:223: in _huggingface_dir
with WeakFileLock(gitignore_lock):
/usr/lib/python3.8/contextlib.py:113: in __enter__
return next(self.gen)
/usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_fixes.py:[84](https://github.com/microsoft/DeepSpeed-MII/actions/runs/9196653791/job/25391010079?pr=473#step:9:85): in WeakFileLock
lock.acquire()
/usr/local/lib/python3.8/dist-packages/filelock/_api.py:295: in acquire
self._acquire()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <filelock._unix.UnixFileLock object at 0x70ff07[85](https://github.com/microsoft/DeepSpeed-MII/actions/runs/9196653791/job/25391010079?pr=473#step:9:86)6fd0>
def _acquire(self) -> None:
ensure_directory_exists(self.lock_file)
open_flags = os.O_RDWR | os.O_TRUNC
if not Path(self.lock_file).exists():
open_flags |= os.O_CREAT
> fd = os.open(self.lock_file, open_flags, self._context.mode)
E FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-root/pytest-0/test_local_model_dir_True_nofa0/mii-ci-models/facebook/opt-125m/.huggingface/.gitignore.lock'
/usr/local/lib/python3.8/dist-packages/filelock/_unix.py:42: FileNotFoundError
@awan-10 @lekurile Thanks for approving this PR! There is 1 unit test failed due to a model download error (see below for the detailed error log). This PR has nothing to do with the failed test or the model download. It came from a FS lock issue. Do you have any ideas of why this happens?
==================================== ERRORS ==================================== _ ERROR at setup of test_local_model_dir[True-nofail-text-generation-None-1-auto-facebook/opt-125m-True] _ model_name = 'facebook/opt-125m', local_model = True tmpdir = local('/tmp/pytest-of-root/pytest-0/test_local_model_dir_True_nofa0') @pytest.fixture(scope="function") def model_path(model_name, local_model, tmpdir): if not local_model: return None base_dir = os.getenv("HF_HOME", tmpdir) download_dir = os.path.join(base_dir, "mii-ci-models", model_name) > snapshot_download(model_name, local_dir=download_dir) conftest.py:87: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ /usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn return fn(*args, **kwargs) /usr/local/lib/python3.8/dist-packages/huggingface_hub/_snapshot_download.py:294: in snapshot_download thread_map( /usr/local/lib/python3.8/dist-packages/tqdm/contrib/concurrent.py:69: in thread_map return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs) /usr/local/lib/python3.8/dist-packages/tqdm/contrib/concurrent.py:51: in _executor_map return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs)) /usr/local/lib/python3.8/dist-packages/tqdm/std.py:11[78](https://github.com/microsoft/DeepSpeed-MII/actions/runs/9196653791/job/25391010079?pr=473#step:9:79): in __iter__ for obj in iterable: /usr/lib/python3.8/concurrent/futures/_base.py:619: in result_iterator yield fs.pop().result() /usr/lib/python3.8/concurrent/futures/_base.py:444: in result return self.__get_result() /usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result raise self._exception /usr/lib/python3.8/concurrent/futures/thread.py:57: in run result = self.fn(*self.args, **self.kwargs) /usr/local/lib/python3.8/dist-packages/huggingface_hub/_snapshot_download.py:268: in _inner_hf_hub_download return hf_hub_download( /usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn return fn(*args, **kwargs) /usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py:1202: in hf_hub_download return _hf_hub_download_to_local_dir( /usr/local/lib/python3.8/dist-packages/huggingface_hub/file_download.py:1406: in _hf_hub_download_to_local_dir paths = get_local_download_paths(local_dir=local_dir, filename=filename) /usr/local/lib/python3.8/dist-packages/huggingface_hub/_local_folder.py:138: in get_local_download_paths metadata_path = _huggingface_dir(local_dir) / "download" / f"{sanitized_filename}.metadata" /usr/local/lib/python3.8/dist-packages/huggingface_hub/_local_folder.py:223: in _huggingface_dir with WeakFileLock(gitignore_lock): /usr/lib/python3.8/contextlib.py:113: in __enter__ return next(self.gen) /usr/local/lib/python3.8/dist-packages/huggingface_hub/utils/_fixes.py:[84](https://github.com/microsoft/DeepSpeed-MII/actions/runs/9196653791/job/25391010079?pr=473#step:9:85): in WeakFileLock lock.acquire() /usr/local/lib/python3.8/dist-packages/filelock/_api.py:295: in acquire self._acquire() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <filelock._unix.UnixFileLock object at 0x70ff07[85](https://github.com/microsoft/DeepSpeed-MII/actions/runs/9196653791/job/25391010079?pr=473#step:9:86)6fd0> def _acquire(self) -> None: ensure_directory_exists(self.lock_file) open_flags = os.O_RDWR | os.O_TRUNC if not Path(self.lock_file).exists(): open_flags |= os.O_CREAT > fd = os.open(self.lock_file, open_flags, self._context.mode) E FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pytest-of-root/pytest-0/test_local_model_dir_True_nofa0/mii-ci-models/facebook/opt-125m/.huggingface/.gitignore.lock' /usr/local/lib/python3.8/dist-packages/filelock/_unix.py:42: FileNotFoundError
@weiqisun - this appears to be resolved now. Do we need to take care of any other reviews/changes on this PR before merging?
@weiqisun - this appears to be resolved now. Do we need to take care of any other reviews/changes on this PR before merging?
Thanks @loadams! I just updated the model names used in the new unit tests to get rid of the following model authentication error. The model names are now consistent with those used in the other unit tests.
Access to model mistralai/Mistral-7B-v0.1 is restricted. You must be authenticated to access it.
It is good to go now.
Currently MII use
vocab_size
to truncate generated logits from the model, which is set astokenizer.vocab_size
inmodeling.tokenizers.HFTokenizer
.However, in
transformers
thevocab_size
attribute of a tokenizer only counts the base vocabulary size. It doesn't include additional tokens added to the tokenizer:This causes an issue that MII can never generate added tokens in the vocabulary. In Llama 3, all special tokens, include the
bos
token,eos
token, and other special tokens, are added tokens:Due to this issue, the logits of special tokens in the Llama3 model are truncated. Therefore the
eos
token or<|eot_id|>
token will never be generated from the decoding step. Thus the generation will never stop until reach themax_new_tokens
ormax_lenght
limit. To reproduce this issue, load any Llama3 model and try any prompts. The generation will never stop before reaching the length limit.Instead of
tokenizer.vocab_size
,len(tokenizer)
should be used since it includes the added tokens: