Closed wj1017090777 closed 6 days ago
应该是transformers版本更新了,你更新一下项目代码试试
我本机部署的,升级了一下新版,出现了同样的问题
好像不是qwen2,qwen1.5也一样,git pull前都挺好的,更新后就这样了
是transformers最新版本添加了tools参数
所以原来的位置参数有点问题
好像不是qwen2,qwen1.5也一样,git pull前都挺好的,更新后就这样了
我测试是没问题的
好像不是qwen2,qwen1.5也一样,git pull前都挺好的,更新后就这样了
我测试是没问题的
拉去最新代码后解决了
好像不是qwen2,qwen1.5也一样,git pull前都挺好的,更新后就这样了
我测试是没问题的
2024-07-04 08:28:47,272 WARNING services.py:2009 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67084288 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2024-07-04 08:28:48,452 INFO worker.py:1771 -- Started a local Ray instance.
INFO 07-04 08:28:50 config.py:623] Defaulting to use mp for distributed inference
INFO 07-04 08:28:50 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='checkpoints/glm-4-9b-chat', speculative_config=None, tokenizer='checkpoints/glm-4-9b-chat', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=checkpoints/glm-4-9b-chat)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING 07-04 08:28:51 tokenizer.py:126] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead.
Traceback (most recent call last):
File "
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
ERROR 07-04 08:28:55 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 5653 died, exit code: 1 INFO 07-04 08:28:55 multiproc_worker_utils.py:123] Killing local vLLM worker processes
我在跑glm-4的时候 遇到这个问题 能帮忙看看吗
加上这两个环境变量试试
加上这两个环境变量试试
感谢回复 目前跑起来了 不过有问题,报错如下:
2024-07-05 01:37:41.905 | DEBUG | api.vllm_routes.chat:create_chat_completion:74 - ==== request ====
{'model': 'gpt-3.5-turbo', 'frequency_penalty': 0.0, 'function_call': None, 'functions': None, 'logit_bias': None, 'logprobs': False, 'max_tokens': 512, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['<|observation|>', '<|endoftext|>', '
刚注意到glm4的tokenizer文件改动了,更新一下项目代码应该就可以了
提交前必须检查以下项目 | The following items must be checked before submission
问题类型 | Type of problem
模型推理和部署 | Model inference and deployment
操作系统 | Operating system
Linux
详细描述问题 | Detailed description of the problem
Dependencies
运行日志或截图 | Runtime logs or screenshots
直接docker-compose 部署qwen2后 用首页的测试openai的代码测试接口报以上错,tools为none 不知为何会报错