Closed Hideki105 closed 5 months ago
It seems that a unicode decode error has occurred in Tokenizer.cs. Could you mind fixing the code for Japanese language?
public string Decode(ReadOnlySpan<int> sequence)
{
IntPtr outStr = IntPtr.Zero;
unsafe
{
fixed (int* sequencePtr = sequence)
{
Result.VerifySuccess(NativeMethods.OgaTokenizerDecode(_tokenizerHandle, sequencePtr, (UIntPtr)sequence.Length, out outStr));
}
}
try
{
return StringUtils.FromUtf8(outStr);
}
finally
{
NativeMethods.OgaDestroyString(outStr);
}
}
Thank you for reporting this @Hideki105. We will look into it
Hi @Hideki105 We have a fix for the Unicode decode error in progress. In the mean time, can you try with this script, which does not using streaming decoding:
python model-generate.py -m <path to your model> -pr <your prompt>
The script is in examples/python
.
Note: Phi-3-mini was predominantly trained and optimized for English. Its capabilities in other languages are limited, meaning it could understand but will not be as fluent as English. Customers are encouraged to use Microsoft Translator service in tandem to translate prompt and responses for best results.
@natke , could you tell us why disabling streaming decoding can avoid the Unicode decode error?
I also faced the very same issue that @Hideki105 reported when I run examples/python/phi3-qa.py
. I replaced tokenizer_stream.decode(new_token)
with tokenizer.decode(new_token)
in the following line to disable streaming decoding, but the problem persists. Even I deleted flush=True
in the print()
but in vain...
I came up with a workaround. In case of examples/python/phi3-qa.py
, you can replace L.65 with the following script, then the characters which the tokenizer fails to decode are shown as a replacement character U+FFFD �
:
try:
print(
tokenizer_stream.decode(new_token),
end="",
flush=True,
)
except:
print(
"�",
end="",
flush=True,
)
@natke , the issue may reside within the tokenizer of the 128k-ONNX model, as the tokenizer for the standard model microsoft/Phi-3-mini-128k-instruct decodes non-ascii sentences without any character corruption (i.e. mojibake or tofu), as shown below:
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
pipeline,
)
import json
import time
import gc
torch.random.manual_seed(0)
model_name = "microsoft/Phi-3-mini-128k-instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map={"": 0},
# device_map="cuda",
torch_dtype=torch.float16,
# torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
user_prompt="データ分析するにはなにをすればいい?"
messages = [
{
"role": "user",
"content": user_prompt,
},
]
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
)
generation_args = {
"max_new_tokens": 2048,
"return_full_text": False,
"temperature": 0.0,
"do_sample": False,
}
start_time = time.time()
output = pipe(messages, **generation_args)
gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()
end_time = time.time()
exec_time = end_time - start_time
print(
output[0]["generated_text"],
f"###\n\n{exec_time} sec elapsed.\n\n###",
sep="\n\n",
)
データ分析を行うためには、以下のステップを踏むことが一般的です。
1. データの収集と前処理: データを収集し、不要な部分を削除または修正し、データの品質を向上させます。
2. データの整理: データを整理し、必要な情報を抽出します。
3. データの探索: データの基本的な特徴を理解するために、統計的手法を使用します。
4. データの整合性のチェック: データが一貫しているかどうかを確認します。
5. データのモデリング: データを視覚的に表現するためにグラフやチャートを作成します。
6. データの解釈: 分析した結果をデータの意味を見出し、意図する目的に応じた洞察を得ます。
7. データの報告: 分析結果を明確にし、他の人に理解しやすい形で報告します。
これらのステップは、データ分析の基本的な流れを示しています。具体的なツールや技術は、分析の目的やデータの性質によって異なります。
###
31.05531597137451 sec elapsed.
###
Is there a method to utilize the tokenizer of the standard model when employing the ONNX model? Furthermore, beyond just the standard model's tokenizer, is there a possibility to combine it with other tokenizers (those proficient in Japanese tokenization) to generate text using the ONNX model?
Hi @CLRafaelR, if you are working in Python, you can swap out any other tokenizer.
ONNX Runtime supports environments other than Python, so we have a core C++ implementation, with bindings to the other languages.
If you upgrade to 0.2.0-rc6, you should improved handling of failed decoding. Please let us know this goes.
Regarding the integration of the functionality of the other tokenizers, can you provide some more detail?
I will re-iterate that Phi-3 in particular is not targeting languages other than English, so you will likely not see good model performance in other languages even if the tokenizer is performing better.
Hi @natke , thank you for your swift response.
I successfully upgraded the package to 0.2.0-rc6
. However, I encountered an error that prevented me from performing inference with the model, after removing CUDA 11.x and unifying my system to CUDA 12.x.
Traceback (most recent call last):
File "/home/MYDIR/self-instruct/.venv/lib/python3.11/site-packages/onnxruntime_genai/__init__.py", line 11, in <module>
from onnxruntime_genai.onnxruntime_genai import *
ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/MYDIR/self-instruct/test/phi-3/phi3-qa.py", line 1, in <module>
import onnxruntime_genai as og
File "/home/MYDIR/self-instruct/.venv/lib/python3.11/site-packages/onnxruntime_genai/__init__.py", line 14, in <module>
from onnxruntime_genai.onnxruntime_genai import *
ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory
I'm currently experiencing difficulties reinstalling CUDA 11.x, but I will attempt to run the model and report back once CUDA 11.x is reinstalled. By the way, ---although this question deviates from the original inquery---, are there any plans to support CUDA 12.x for onnxruntime-genai? I believe support for CUDA 12.x would be helpful.
I am fully aware that Phi-3 is primarily trained on English data and may not perform as well in other languages. However, judging from the performance of the 128k model (standard model), I feel its Japanese generation capabilities outperform those of Llama-2 and Llama-2-based models that are fine-tuned using Japanese data. When considering the use of a different tokenizer with this ONNX model, for instance, is it possible to use the tokenizer llm-jp/hf-fast-tokenizer-v22b2 · Hugging Face, which can be loaded as follows, as the tokenizer for the ONNX model?
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("llm-jp/hf-fast-tokenizer-v21b3")
Hi @CLRafaelR Thank you for the extra context.
Yes, you can try to run with that alternate tokenizer. Instead of calling the onnxruntime-genai tokenizer, you can call the new one and set the input ids of the onnxruntime-genai params to the output of the new tokenizer instead.
params.input_ids = <input_tokens from alternate tokenizer>
generator = og.Generator(model, params)
And yes we plan to publish packages with CUDA 12.
Hi @natke
And yes we plan to publish packages with CUDA 12.
Is there any release date or place where i can get the information once the CUDA 12 version is published?
I followed the steps in here https://onnxruntime.ai/docs/install/#install-onnx-runtime-gpu-cuda-12x but at the end
ModuleNotFoundError: No module named 'onnxruntime_genai'
# pip list
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvcc-cu12 12.3.107
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.3.101
nvidia-nvtx-cu12 12.1.105
onnx 1.16.0
onnxruntime-gpu 1.17.1
onnxruntime-training 1.17.3
So I assume we are close to getting it, right? How can we help?
Having cuda installed and running I needed really view libraries to make it run. It worked for me on a windows computer quite well in Conda Environment. Maybe this helps you:
I also got the same error message when I had more then one onnx Library installed. Hope that helps.
@Hideki105 were you able to get your Japanese model running successfully?
Hi, @natke https://github.com/natke.
I'm still getting an error when I use \Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32.
Hideki
[image: image.png]
2024年5月22日(水) 4:25 Nat Kershaw (MSFT) @.***>:
@Hideki105 https://github.com/Hideki105 were you able to get your Japanese model running successfully?
— Reply to this email directly, view it on GitHub https://github.com/microsoft/onnxruntime-genai/issues/314#issuecomment-2123297586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ5JTTIT3C2MORTGFMBLCALZDONUHAVCNFSM6AAAAABGWFSGS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGI4TONJYGY . You are receiving this because you were mentioned.Message ID: @.***>
Hi @Hideki105, I don't see the error in your message. Can you please add more information so that we can see what the error is?
@Hideki105, are you still running into issues? Please let us know
Closing this issue, as we did not hear back from you. Please re-open if you are still experiencing this issue
I still get the ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory
error, I have never intended to use cuda 11 but I also can't find a version of onnxruntime-genai-cuda that works with cuda 12. there is no any table for this repo to state whether it support cuda 12 or not?
I still get the ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory error, I have never intended to use cuda 11 but I also can't find a version of onnxruntime-genai-cuda that works with cuda 12. there is no any table for this repo to state whether it support cuda 12 or not?
Our currently released onnxruntime-genai package only supports cuda 11.8. We have not released the cuda 12 package yet. We plan on releasing one in the next release.
@hosseinalipour @3x0dv5 @Johannes-Stephan @CLRafaelR we have published packages supporting CUDA 12. Please install using the following instructions: https://onnxruntime.ai/docs/genai/howto/install.html#cuda-12
Question
Phi-3 can't deal with Japanese I enjoy Phi-3 which Microsoft made. I meet the error of onnxruntime_genai. How can I solve this issue ?
Phi-3 can't deal with Japanese. How can I solve this issue?
Code
Input
Output