Open activezhao opened 10 months ago
Update:
I add add_special_tokens=False
for tokenizer.
I change the code ids = self.tokenizer.encode(word)
to ids = self.tokenizer.encode(word, add_special_tokens=False)
And I change the code output = self.tokenizer.decode(tokens[:seq_len])
to output = self.tokenizer.decode(tokens[:seq_len], add_special_tokens=False)
Now, it works!!
I1114 09:26:38.329280 27908 model.py:275] ================== preprocessing _to_word_list_format flat_ids: [array([0.])]
I1114 09:26:38.329402 27908 model.py:276] ================== preprocessing _to_word_list_format offsets: [array([-1.])]
I1114 09:26:38.329457 27908 model.py:256] ================== preprocessing _to_word_list_format word: greater
I1114 09:26:38.329624 27908 model.py:258] ================== preprocessing _to_word_list_format ids: [7621]
I1114 09:26:38.329814 27908 model.py:275] ================== preprocessing _to_word_list_format flat_ids: [array([7621])]
I1114 09:26:38.329909 27908 model.py:276] ================== preprocessing _to_word_list_format offsets: [array([1])]
I1114 09:26:38.330031 27908 model.py:169] ================== preprocessing execute stop_words: [[[7621]
[ 1]]]
curl --noproxy '*' POST localhost:8250/v2/models/ensemble/generate -d '{"text_input": "def quickSort", "max_tokens": 150, "bad_words": "", "stop_words": "greater"}'
{
"model_name":"ensemble",
"model_version":"1",
"sequence_end":false,
"sequence_id":0,
"sequence_start":false,
"text_output":"<s> def quickSort(arr):\n if len(arr) <= 1:\n return arr\n else:\n pivot = arr[0]\n lesser = [x for x in arr[1:] if x <= pivot]\n greater"
}
curl --noproxy '*' POST localhost:8250/v2/models/ensemble/generate -d '{"text_input": "def quickSort", "max_tokens": 150, "bad_words": "", "stop_words": "lesser"}'
{
"model_name":"ensemble",
"model_version":"1",
"sequence_end":false,
"sequence_id":0,
"sequence_start":false,
"text_output":"<s> def quickSort(arr):\n if len(arr) <= 1:\n return arr\n else:\n pivot = arr[0]\n lesser"
}
But, there is a question, when the stop_words is "\n", it does not work, the inference will not stop early.
curl --noproxy '*' POST localhost:8250/v2/models/ensemble/generate -d '{"text_input": "def quickSort", "max_tokens": 150, "bad_words": "", "stop_words": "\n"}'
We can see the log and the code, self.logger.log_info(f"================== preprocessing _to_word_list_format word: {word}")
is not executed, I guess after words = list(csv.reader(word_dict_item))[0]
, the words is null.
And the final stop_words is: [[[ 0] [-1]]], the first array [0] should be the token of "\n", such as [13] in llama, and the [-1] should be the offsets of [1], because of this, the stop_words does not work.
I1115 01:31:19.956018 41237 model.py:275] ================== preprocessing _to_word_list_format flat_ids: [array([0.])]
I1115 01:31:19.956226 41237 model.py:276] ================== preprocessing _to_word_list_format offsets: [array([-1.])]
I1115 01:31:19.956481 41237 model.py:275] ================== preprocessing _to_word_list_format flat_ids: [array([0.])]
I1115 01:31:19.956650 41237 model.py:276] ================== preprocessing _to_word_list_format offsets: [array([-1.])]
I1115 01:31:19.956804 41237 model.py:169] ================== preprocessing execute stop_words: [[[ 0]
[-1]]]
for word_dict_item in word_dict:
item_flat_ids = []
item_offsets = []
if isinstance(word_dict_item[0], bytes):
word_dict_item = [word_dict_item[0].decode()]
words = list(csv.reader(word_dict_item))[0]
for word in words:
self.logger.log_info(f"================== preprocessing _to_word_list_format word: {word}")
ids = self.tokenizer.encode(word)
self.logger.log_info(f"================== preprocessing _to_word_list_format ids: {ids}")
if len(ids) == 0:
continue
How to resolve this?
In your query, it looks like \n isn't escaped with quotes for CSV reader to parse it correctly.
It's a bit cumbersome to have all the special characters parsed correctly with bash -> json -> csv. But wouldn't something like '[...] "stop_words": "\"\\n\""}'
work?
In your query, it looks like \n isn't escaped with quotes for CSV reader to parse it correctly.
It's a bit cumbersome to have all the special characters parsed correctly with bash -> json -> csv. But wouldn't something like
'[...] "stop_words": "\"\\n\""}'
work?
Hi @mickaelseznec I have a question, why we choose to use words = list(csv.reader(word_dict_item))[0]
?
I wonder if we can choose to use numpy
directly?
Here is the code I changes:
def _to_word_list_format(self, word_dict: List[List[str]]):
assert self.tokenizer != None, "need to set tokenizer"
if word_dict.size == 0:
# Return an empty array of shape (1,2,0)
return np.empty([1, 2, 0], dtype="int32")
flat_ids = []
offsets = []
for word_dict_item in word_dict:
item_flat_ids = []
item_offsets = []
if isinstance(word_dict_item[0], bytes):
word_dict_item = [item.decode() for item in word_dict_item]
for word in word_dict_item:
ids = self.tokenizer.encode(word, add_special_tokens=False)
if "llama" in str(type(self.tokenizer)) and len(ids) > 0 and ids[0] == 29871:
ids = ids[1:]
if len(ids) == 0:
continue
item_flat_ids += ids
item_offsets.append(len(ids))
flat_ids.append(np.array(item_flat_ids))
offsets.append(np.cumsum(np.array(item_offsets)))
pad_to = max(1, max(len(ids) for ids in flat_ids))
for i, (ids, offs) in enumerate(zip(flat_ids, offsets)):
flat_ids[i] = np.pad(ids, (0, pad_to - len(ids)),
constant_values=0)
offsets[i] = np.pad(offs, (0, pad_to - len(offs)),
constant_values=-1)
return np.array([flat_ids, offsets], dtype="int32").transpose(
(1, 0, 2))
I remove the code of words = list(csv.reader(word_dict_item))[0]
, and add a process for llama.
And I tested some cases.
The stop_words is null:
curl --location 'http://localhost:8000/v2/models/ensemble/generate' --header 'Content-Type: application/json' --data '{
"text_input": "def quickSort",
"max_tokens": 100,
"bad_words": "",
"stop_words": ""
}'
{
"model_name":"ensemble",
"model_version":"1",
"sequence_end":false,
"sequence_id":0,
"sequence_start":false,
"text_output":"{
"id":"cmpl-72f80645-a4fd-4746-bc92-d27f9bdbe821",
"object":"text_completion",
"created":1700134036,
"model":"ensemble",
"choices":[
{
"index":0,
"text":"(arr):\n if len(arr) <= 1:\n return arr\n else:\n pivot = arr[0]\n lesser = [x for x in arr[1:] if x <= pivot]\n greater = [x for x in arr[1:] if x > pivot]\n return quickSort(lesser) + [pivot] + quickSort(greater)\n\n\ndef quickSort2(arr):\n ",
"logprobs":{
"text_offset":[
],
"token_logprobs":[
],
"tokens":[
"29898",
"2749",
"1125",
"13",
"1678",
"565",
"7431",
"29898",
"2749",
"29897",
"5277",
"29871",
"29896",
"29901",
"13",
"4706",
"736",
"3948",
"13",
"1678",
"1683",
"29901",
"13",
"4706",
"24438",
"353",
"3948",
"29961",
"29900",
"29962",
"13",
"4706",
"3109",
"261",
"353",
"518",
"29916",
"363",
"921",
"297",
"3948",
"29961",
"29896",
"17531",
"565",
"921",
"5277",
"24438",
"29962",
"13",
"4706",
"7621",
"353",
"518",
"29916",
"363",
"921",
"297",
"3948",
"29961",
"29896",
"17531",
"565",
"921",
"1405",
"24438",
"29962",
"13",
"4706",
"736",
"4996",
"13685",
"29898",
"2222",
"261",
"29897",
"718",
"518",
"29886",
"11002",
"29962",
"718",
"4996",
"13685",
"29898",
"7979",
"1008",
"29897",
"13",
"13",
"13",
"1753",
"4996",
"13685",
"29906",
"29898",
"2749",
"1125",
"13",
"1678"
],
"top_logprobs":[
]
},
"finish_reason":"length"
}
],
"usage":{
"prompt_tokens":4,
"total_tokens":104,
"completion_tokens":100
}
}"
}
stop_words is "\n"
curl --location 'http://localhost:8000/v2/models/ensemble/generate' --header 'Content-Type: application/json' --data '{
"text_input": "def quickSort",
"max_tokens": 100,
"bad_words": "",
"stop_words": "\n"
}'
{
"model_name":"ensemble",
"model_version":"1",
"sequence_end":false,
"sequence_id":0,
"sequence_start":false,
"text_output":"{
"id":"cmpl-081ebbea-fd5e-4604-8a09-d19a561776d1",
"object":"text_completion",
"created":1700134187,
"model":"ensemble",
"choices":[
{
"index":0,
"text":"(arr):\n",
"logprobs":{
"text_offset":[
],
"token_logprobs":[
],
"tokens":[
"29898",
"2749",
"1125",
"13"
],
"top_logprobs":[
]
},
"finish_reason":"length"
}
],
"usage":{
"prompt_tokens":4,
"total_tokens":8,
"completion_tokens":4
}
}"
}
More than one words:
If stop_words has more than one words, I use array:
curl --location 'http://localhost:8000/v2/models/ensemble/generate' --header 'Content-Type: application/json' --data '{
"text_input": "def quickSort",
"max_tokens": 100,
"bad_words": "",
"stop_words": ["greater", "pivot"]
}'
{
"model_name":"ensemble",
"model_version":"1",
"sequence_end":false,
"sequence_id":0,
"sequence_start":false,
"text_output":"{
"id":"cmpl-2ca50844-b6f4-413f-b986-f8cb400e2428",
"object":"text_completion",
"created":1700134388,
"model":"ensemble",
"choices":[
{
"index":0,
"text":"(arr):\n if len(arr) <= 1:\n return arr\n else:\n pivot",
"logprobs":{
"text_offset":[
],
"token_logprobs":[
],
"tokens":[
"29898",
"2749",
"1125",
"13",
"1678",
"565",
"7431",
"29898",
"2749",
"29897",
"5277",
"29871",
"29896",
"29901",
"13",
"4706",
"736",
"3948",
"13",
"1678",
"1683",
"29901",
"13",
"4706",
"24438"
],
"top_logprobs":[
]
},
"finish_reason":"length"
}
],
"usage":{
"prompt_tokens":4,
"total_tokens":29,
"completion_tokens":25
}
}"
}
What do you think?
Thanks.
Sure, that makes sense. We’ll add a similar behavior in a next update.
And, keep in mind, the ensemble model is basically an example for people to build upon. You can customize it at will for suiting your needs 🙂
Sure, that makes sense. We’ll add a similar behavior in a next update.
And, keep in mind, the ensemble model is basically an example for people to build upon. You can customize it at will for suiting your needs 🙂
@mickaelseznec OK, hope it gets better.😎
Sure, that makes sense. We’ll add a similar behavior in a next update.
And, keep in mind, the ensemble model is basically an example for people to build upon. You can customize it at will for suiting your needs 🙂
hi, dear, has solution when stop_words="\n"? I tried it on latest version and it still didn't work. thank you
Sure, that makes sense. We’ll add a similar behavior in a next update.
And, keep in mind, the ensemble model is basically an example for people to build upon. You can customize it at will for suiting your needs 🙂
hi, dear, has solution when stop_words="\n"? I tried it on latest version and it still didn't work. thank you
Hi @shatealaboxiaowang u can just try this reply above
https://github.com/triton-inference-server/tensorrtllm_backend/issues/128#issuecomment-1814276748
Sure, that makes sense. We’ll add a similar behavior in a next update. And, keep in mind, the ensemble model is basically an example for people to build upon. You can customize it at will for suiting your needs 🙂
hi, dear, has solution when stop_words="\n"? I tried it on latest version and it still didn't work. thank you
Hi @shatealaboxiaowang u can just try this reply above
thank you,great!
@activezhao @shatealaboxiaowang are you getting this same issue ?
In your query, it looks like \n isn't escaped with quotes for CSV reader to parse it correctly.
It's a bit cumbersome to have all the special characters parsed correctly with bash -> json -> csv. But wouldn't something like
'[...] "stop_words": "\"\\n\""}'
work?
I am wondering why your response contains the following fields: "finish_reason":"length" "usage":{ "prompt_tokens":4, "total_tokens":8, "completion_tokens":4 } How to customize the content and format of the returned field on the server side?
In your query, it looks like \n isn't escaped with quotes for CSV reader to parse it correctly.
It's a bit cumbersome to have all the special characters parsed correctly with bash -> json -> csv. But wouldn't something like
'[...] "stop_words": "\"\\n\""}'
work?I am wondering why your response contains the following fields: "finish_reason":"length" "usage":{ "prompt_tokens":4, "total_tokens":8, "completion_tokens":4 } How to customize the content and format of the returned field on the server side?
@shatealaboxiaowang We just change the model.py file, and add the code of OpenAI format.
In your query, it looks like \n isn't escaped with quotes for CSV reader to parse it correctly. It's a bit cumbersome to have all the special characters parsed correctly with bash -> json -> csv. But wouldn't something like
'[...] "stop_words": "\"\\n\""}'
work?I am wondering why your response contains the following fields: "finish_reason":"length" "usage":{ "prompt_tokens":4, "total_tokens":8, "completion_tokens":4 } How to customize the content and format of the returned field on the server side?
@shatealaboxiaowang We just change the model.py file, and add the code of OpenAI format.
Thank you for your reply, I am in the postprocessing/1/model.py and change the source code like this: inference_response = pb_utils.InferenceResponse(output_tensors=[ output_tensor, out_cum_log_probs, out_output_log_probs, out_sequence_lengths ]) increases the return field (out_sequence_lengths) in the inference_response, but it doesn't take effect, you can tell me how you change openai format,In which model.py file did you make the change and how did you change the source code?
First:
I download the latest
tensorrtllm_backend
of main branch.Second:
I execute the following command to build a docker image using the latest
tensorrtllm_backend
of main branch.Third:
I get a docker image like this:
Fourth:
I launch the docker image of
triton_trt_llm
Fifth:
In the container, I execute the command for
build-tensorrt-llm
https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/installation.md#build-tensorrt-llmSixth:
I build engines with code-llama-7b
Finally:
I call the checkpoint like this, as we can see, the stop_words dose not works.
Add log print in preprocessing's model.py
And here is the preprocessing _to_word_list_format ids:
Add log print in postprocessing's model.py
And here is the preprocessing _to_word_list_format ids:
As we can see, the tokens of stop_words [4996, 13685] appears in the postproceiing's output tokens, but the inference dose not stop early.