Open ShadowTeamCN opened 4 months ago
I have encountered the same problem.
I have encountered the same problem.
You can directly using vllm for inference, I find it compatibale with Mistral-Large-2
I have encountered the same problem.
You can directly using vllm for inference, I find it compatibale with Mistral-Large-2
can you tell me which version you have installed?
I have encountered the same problem.
Thanks. Can you give detailed instructions on how you use vllm?
I have encountered the same problem.
You can directly using vllm for inference, I find it compatibale with Mistral-Large-2
I have encountered the same problem.
You can directly using vllm for inference, I find it compatibale with Mistral-Large-2
use api?
@liuanping @shangh1 @endNone all my package version are listed above, as for vllm, that is vllm==0.5.2 inference code is quite simple , I'm using 4*H100 for mistral-large-2
from vllm import LLM,SamplingParams
llm = LLM(path,tensor_parallel_size=4,max_seq_len_to_capture=8192*2,gpu_memory_utilization=0.95)
tokenizer = llm.get_tokenizer()
prompt=f'''Your prompt here'''
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
sampling_params=SamplingParams(temperature=0.4,max_tokens=8192,stop=[tokenizer.eos_token])
output=llm.generate(input_ids,sampling_params=sampling_params,use_tqdm=False)
print(output[0].outputs[0].text)
@liuanping @shangh1 @fuegoio all my package version are listed above, as for vllm, that is vllm==0.5.2 inference code is quite simple , I'm using 4*H100 for mistral-large-2
from vllm import LLM,SamplingParams llm = LLM(path,tensor_parallel_size=4,max_seq_len_to_capture=8192*2,gpu_memory_utilization=0.95) tokenizer = llm.get_tokenizer() prompt=f'''Your prompt here''' messages = [{"role": "user", "content": prompt}] input_ids = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) sampling_params=SamplingParams(temperature=0.4,max_tokens=8192,stop=[tokenizer.eos_token]) output=llm.generate(input_ids,sampling_params=sampling_params,use_tqdm=False) print(output[0].outputs[0].text)
Thank you!Nice work!
Actually, I'm keen on trying out the official mistral_inference for testing purposes. Could you please tell me when the official team plans to fix this bug?
Python -VV
Pip Freeze
Reproduction Steps
from mistral_inference.transformer import Transformer from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from mistral_common.protocol.instruct.messages import UserMessage from mistral_common.protocol.instruct.request import ChatCompletionRequest
mistral_models_path='/path/to/Mistral-Large-Instruct-2407/' tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3") model = Transformer.from_folder(mistral_models_path)
Expected Behavior
load model successfully
Additional Context
AssertionError Traceback (most recent call last) Cell In[3], line 10 8 mistral_models_path='/home/tione/notebook/PretrainModelStore/Mistral-Large-Instruct-2407/' 9 tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tokenizer.model.v3") ---> 10 model = Transformer.from_folder(mistral_models_path)
File /usr/local/lib/python3.10/dist-packages/mistral_inference/transformer.py:353, in Transformer.from_folder(folder, max_batch_size, num_pipeline_ranks, device, dtype) 350 pt_model_file = Path(folder) / "consolidated.00.pth" 351 safetensors_model_file = Path(folder) / "consolidated.safetensors" --> 353 assert ( 354 pt_model_file.exists() or safetensors_model_file.exists() 355 ), f"Make sure either {pt_model_file} or {safetensors_model_file} exists" 356 assert not ( 357 pt_model_file.exists() and safetensors_model_file.exists() 358 ), f"Both {pt_model_file} and {safetensors_model_file} cannot exist" 360 if pt_model_file.exists():
AssertionError: Make sure either /home/tione/notebook/PretrainModelStore/Mistral-Large-Instruct-2407/consolidated.00.pth or /home/tione/notebook/PretrainModelStore/Mistral-Large-Instruct-2407/consolidated.safetensors exists
Suggested Solutions
I think the model file check script check the wrong file name: