triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
650 stars 93 forks source link

ChatGLM2-6B tokenizer can't load? #197

Open THU-mjx opened 9 months ago

THU-mjx commented 9 months ago

when i built a ChatGLM2-6B engine to launch the triton-server, an error occurred: Tokenizer class ChatGLMTokenizer does not exist

image

byshiue commented 9 months ago

Hope this issue is helpful https://github.com/THUDM/ChatGLM2-6B/issues/243.

THU-mjx commented 9 months ago

when i add "trust_remode_code=True" in preprocessing/&postprocessing/1/model.py, another error occurred : Failed to initialize Python stub: AttributeError: can‘t set attribute ’pad_token image I tried to fix it with this methods

image

The server could be built, but can't be used!

image

byshiue commented 9 months ago

What branch do you use? If you don't use the latest main branch, please try it. Please remember updating the tensorrt_llm, too. Also, you need to rebuild the tensorrt_llm and the engine when you update the codes.

THU-mjx commented 9 months ago

Using the latest tensorrt_llm and tensorrtllm_backend, I failed to launch the server with the above error: image

THU-mjx commented 9 months ago

I fixed this issue. The latest version of trtllm-backend have to manually set max_batch_size. I have another question : how to set these values? image

byshiue commented 9 months ago

You can set 1 first.

THU-mjx commented 9 months ago

Another error happened in preprocessing/postprocessing/model.py (I test several models, the errors are same!) image IN model.py, i just added 'trust_remote_code=True' image

byshiue commented 9 months ago

Please refer https://github.com/THUDM/ChatGLM3/issues/152.

A possible workaround is preventing assign the pad_token, and replace all pad_token by eos_token.