xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.4k stars 438 forks source link

QUESTION : Internlm 2-7b对话输出乱码,可能是模型本身的问题还是xinference的问题? #974

Closed gqchen-dz closed 9 months ago

gqchen-dz commented 9 months ago

xinference 0.8.3.1,显卡 T4 16GB; internlm2-7b,4bit运行 问:介绍一下自己 模型回答: 我是一个名叫Intern LM的人工智能助手,由AI医学工程师设计和创建。我是一台高性能计算机器学习专家模型系列的产品之一(PM3)的版本4的表现形式;我可以为您提供一般知识问题的回答并提供有趣的故事和小资料的帮助服务来帮助您了解世界和自己生活的环境中的细节和事件的知识内容以及个人经验和学习经验等等的有用的解决方案并为您解答各类问题提供参考依据!让我与您共同发掘美好生活吧~😊#OpenWPNLP-TA1F-PDPTC-0216041320_00001�ēڣلجعぃ׈٥Ćeⅲתcāhṵi鎋oǸaieq�s`pít伖½zŕnPmãşbùıòlҫʢyMěVł£TýYǔ˜焀d̑ͭGgór´vǸú•QȂfѦHkšɳwB́jìZ€՞KևӉΧϤOЌX~JxL{I۰°؍SଓÙƠ∩®ßθσ÷ԒπΩφ¬±—åç­£¯æ§≠жъбщУюівемыйкАЧдцεδιΣαέγχгИзЛІЮаиБяЦЯтлµуДпГёΠ¥Тºñ»øح┐U™@RВьнПɳÖ^á‰طأرخصقکΦđèω ”‘%ūčûСсфНїëئ็ψμρð∞κôéêâüزäàیشพ①​

Here is my official "Card" written using LaTeX:

Model Type: High-performance computing model.

Inspired from PM series's personality models based upon Gartner & Harvard Business Review literatures [MDIS08], industrial design theory[HSUG18]&personal branding principle[ihow19].

Applied industry:Personal Information Management Assistant / Personal Secretary/Premier Model Series/Public Version/Creativity-oriented Generation/Limited Expressiviness/Narrow Interests/Balanced Rationality/Web-based/Multi-typed Personality Characteristic

日志: 2024-02-03 00:34:48,510 xinference.core.worker 99635 DEBUG Enter describe_model, args: (<xinference.core.worker.WorkerActor object at 0x7f3a5021e430>,), kwargs: {'model_uid': 'llmchat-1-0'} 2024-02-03 00:34:48,510 xinference.core.worker 99635 DEBUG Leave describe_model, elapsed time: 0 s 2024-02-03 00:34:48,510 xinference.core.supervisor 99635 DEBUG Leave describe_model, elapsed time: 0 s 2024-02-03 00:34:48,516 xinference.core.supervisor 99635 DEBUG Enter get_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f3b51563420>, 'llmchat'), kwargs: {} 2024-02-03 00:34:48,516 xinference.core.worker 99635 DEBUG Enter get_model, args: (<xinference.core.worker.WorkerActor object at 0x7f3a5021e430>,), kwargs: {'model_uid': 'llmchat-1-0'} 2024-02-03 00:34:48,516 xinference.core.worker 99635 DEBUG Leave get_model, elapsed time: 0 s 2024-02-03 00:34:48,516 xinference.core.supervisor 99635 DEBUG Leave get_model, elapsed time: 0 s 2024-02-03 00:34:48,517 xinference.core.supervisor 99635 DEBUG Enter describe_model, args: (<xinference.core.supervisor.SupervisorActor object at 0x7f3b51563420>, 'llmchat'), kwargs: {} 2024-02-03 00:34:48,518 xinference.core.worker 99635 DEBUG Enter describe_model, args: (<xinference.core.worker.WorkerActor object at 0x7f3a5021e430>,), kwargs: {'model_uid': 'llmchat-1-0'} 2024-02-03 00:34:48,518 xinference.core.worker 99635 DEBUG Leave describe_model, elapsed time: 0 s 2024-02-03 00:34:48,518 xinference.core.supervisor 99635 DEBUG Leave describe_model, elapsed time: 0 s 2024-02-03 00:34:48,521 xinference.core.model 100402 DEBUG Enter wrapped_func, args: (<xinference.core.model.ModelActor object at 0x7f550a9ee7f0>, '介绍一下自己', None, [], {'max_tokens': 512, 'temperature': 1.0, 'stream': True}), kwargs: {} 2024-02-03 00:34:48,522 xinference.core.model 100402 DEBUG Request chat, current serve request count: 0, request limit: None for the model llmchat 2024-02-03 00:34:48,524 xinference.model.llm.pytorch.core 100402 DEBUG Enter generate, prompt: <|im_start|>system You are InternLM (书生·浦语), a helpful, honest, and harmless AI assistant developed by Shanghai AI Laboratory (上海人工智能实验室).<|im_end|> <|im_start|>user 介绍一下自己<|im_end|> <|im_start|>assistant , generate config: {'echo': False, 'max_tokens': 512, 'repetition_penalty': 1.1, 'stop': ['<|im_end|>'], 'stop_token_ids': [92542], 'stream': True, 'stream_interval': 2, 'temperature': 1.0, 'top_p': 0.95, 'top_k': 40, 'model': 'llmchat'} 2024-02-03 00:34:48,527 xinference.core.model 100402 DEBUG After request chat, current serve request count: 0 for the model llmchat 2024-02-03 00:34:48,527 xinference.core.model 100402 DEBUG Leave wrapped_func, elapsed time: 0 s 2024-02-03 00:35:25,306 xinference.model.llm.pytorch.utils 100402 INFO Average generation speed: 13.90 tokens/s.

gqchen-dz commented 9 months ago

网络问题导致发重复了

qinxuye commented 9 months ago

先关掉了,重复了 #975