microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
502 stars 126 forks source link

Phi-3 can't deal with Japanese. How can I solve this issue? #314

Closed Hideki105 closed 5 months ago

Hideki105 commented 6 months ago

Question

Phi-3 can't deal with Japanese I enjoy Phi-3 which Microsoft made. I meet the error of onnxruntime_genai. How can I solve this issue ?

Phi-3 can't deal with Japanese. How can I solve this issue?

Code

import onnxruntime_genai as og
import argparse
import time

model = og.Model(".\Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

def input_llm(text):
    print("Question:",text)
    input_tokens = tokenizer.encode(text)
    params = og.GeneratorParams(model)
    params.try_use_cuda_graph_with_max_batch_size(1)
    params.input_ids = input_tokens
    generator = og.Generator(model, params)
    return generator

def output_llm(generator):
    print("Answer:")
    stt = time.time()
    list_error = []
    list_sentence = []
    while not generator.is_done():
        generator.compute_logits()
        generator.generate_next_token()
        new_token = generator.get_next_tokens()[0]
        if not new_token in list_error:
            try:
                list_sentence.append(tokenizer_stream.decode(new_token))
            except:
                list_error.append(new_token)
                list_sentence.append(new_token)
    print(list_sentence)
    fin = time.time()
    print(fin-stt)
    return list_error

Input

text = "<|user|>こんにちは。データ分析するにはなにをすればいい?<|end|><|assistant|>"
generator = input_llm(text)
list_error= output_llm(generator)
print(list_error)

Output

Question: <|user|>こんにちは。データ分析するにはなにをすればいい?<|end|><|assistant|>
Answer:
['', 'デ', 'ー', 'タ', '分', 233, 161, 147, 'に', 'は', 'い', 'く', 'つ', 'か', 'の', 'ス', 'テ', 'ッ', 'プ', 'が', 'あ', 'り', 'ま', 'す', '。', 'ま', 'ず', '、', 'デ', 'ー', 'タ', 'セ', 'ッ', 'ト', 'を', 232, 146, 145, '集', 'し', 'ま', 'す', '。', 'こ', 'れ', 'は', '、', '必', '要', 'な', '情', '報', 'を', 174, 'む', 'デ', 'ー', 'タ', 'を', '集', 'め', 'る', 'こ', 'と', 'で', 'す', '。', '次', 'に', '、', 'デ', 'ー', 'タ', 'を', 152, 183, '理', 'し', '、', '不', '要', 'な', '部', '分', 'を', '除', '去', 'し', 'ま', 'す', '。', 'こ', 'れ', 'に', 'よ', 'り', '、', '分', 'の', 141, 188, 234, 138, 'が', '向', '上', 'し', 'ま', 'す', '。', 'そ', 'の', '後', '、', 'デ', 'ー', 'タ', 'を', '分', 'し', '、', '意', 148, 182, 'の', 'あ', 'る', '洞', 178, 162, 'や', 'パ', 'タ', 'ー', 'ン', 'を', '見', 'つ', 'け', '出', 'し', 'ま', 'す', '。', '最', '後', 'に', '、', '分', '結', '果', 'を', '解', 236, 139, 'し', '、', '意', 'を', 179, 'き', '出', 'し', '、', 'そ', 'の', '結', '果', 'を', '報', '告', 'し', 'ま', 'す', '。', 'こ', 'れ', 'ら', 'の', 'ス', 'テ', 'ッ', 'プ', 'を', 235, 187, 'む', 'こ', 'と', 'で', '、', 'デ', 'ー', 'タ', '分', 'を', '成', '長', 'さ', 'せ', 'る', 'こ', 'と', 'が', 'で', 'き', 'ま', 'す', '。', '']
63.10930633544922
[233, 161, 147, 232, 146, 145, 174, 152, 183, 141, 188, 234, 138, 148, 182, 178, 162, 236, 139, 179, 235, 187]
Hideki105 commented 6 months ago

Unicode Deocde Error

It seems that a unicode decode error has occurred in Tokenizer.cs. Could you mind fixing the code for Japanese language?

public string Decode(ReadOnlySpan<int> sequence)
        {
            IntPtr outStr = IntPtr.Zero;
            unsafe
            {
                fixed (int* sequencePtr = sequence)
                {
                    Result.VerifySuccess(NativeMethods.OgaTokenizerDecode(_tokenizerHandle, sequencePtr, (UIntPtr)sequence.Length, out outStr));
                }
            }
            try
            {
                return StringUtils.FromUtf8(outStr);
            }
            finally
            {
                NativeMethods.OgaDestroyString(outStr);
            }
        }
natke commented 6 months ago

Thank you for reporting this @Hideki105. We will look into it

natke commented 6 months ago

Hi @Hideki105 We have a fix for the Unicode decode error in progress. In the mean time, can you try with this script, which does not using streaming decoding:

python model-generate.py -m <path to your model> -pr <your prompt>

The script is in examples/python.

Note: Phi-3-mini was predominantly trained and optimized for English. Its capabilities in other languages are limited, meaning it could understand but will not be as fluent as English. Customers are encouraged to use Microsoft Translator service in tandem to translate prompt and responses for best results.

CLRafaelR commented 6 months ago

@natke , could you tell us why disabling streaming decoding can avoid the Unicode decode error?

I also faced the very same issue that @Hideki105 reported when I run examples/python/phi3-qa.py. I replaced tokenizer_stream.decode(new_token) with tokenizer.decode(new_token) in the following line to disable streaming decoding, but the problem persists. Even I deleted flush=True in the print() but in vain...

https://github.com/microsoft/onnxruntime-genai/blob/1f3776d425afbd2e8f83f126f1c02f0d13633ea0/examples/python/phi3-qa.py#L65

CLRafaelR commented 6 months ago

I came up with a workaround. In case of examples/python/phi3-qa.py, you can replace L.65 with the following script, then the characters which the tokenizer fails to decode are shown as a replacement character U+FFFD :

                try:
                    print(
                        tokenizer_stream.decode(new_token),
                        end="",
                        flush=True,
                    )
                except:
                    print(
                        "�",
                        end="",
                        flush=True,
                    )
スクリーンショット 2024-05-03 162935
CLRafaelR commented 6 months ago

@natke , the issue may reside within the tokenizer of the 128k-ONNX model, as the tokenizer for the standard model microsoft/Phi-3-mini-128k-instruct decodes non-ascii sentences without any character corruption (i.e. mojibake or tofu), as shown below:

Input

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    pipeline,
)
import json
import time
import gc

torch.random.manual_seed(0)

model_name = "microsoft/Phi-3-mini-128k-instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map={"": 0},
    # device_map="cuda",
    torch_dtype=torch.float16,
    # torch_dtype="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

user_prompt="データ分析するにはなにをすればいい?"

messages = [
    {
        "role": "user",
        "content": user_prompt,
    },
]

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 2048,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}

start_time = time.time()

output = pipe(messages, **generation_args)

gc.collect()
torch.cuda.empty_cache()
torch.cuda.reset_peak_memory_stats()

end_time = time.time()

exec_time = end_time - start_time

print(
    output[0]["generated_text"],
    f"###\n\n{exec_time} sec elapsed.\n\n###",
    sep="\n\n",
)

Output

 データ分析を行うためには、以下のステップを踏むことが一般的です。

1. データの収集と前処理: データを収集し、不要な部分を削除または修正し、データの品質を向上させます。

2. データの整理: データを整理し、必要な情報を抽出します。

3. データの探索: データの基本的な特徴を理解するために、統計的手法を使用します。

4. データの整合性のチェック: データが一貫しているかどうかを確認します。

5. データのモデリング: データを視覚的に表現するためにグラフやチャートを作成します。

6. データの解釈: 分析した結果をデータの意味を見出し、意図する目的に応じた洞察を得ます。

7. データの報告: 分析結果を明確にし、他の人に理解しやすい形で報告します。

これらのステップは、データ分析の基本的な流れを示しています。具体的なツールや技術は、分析の目的やデータの性質によって異なります。

###

31.05531597137451 sec elapsed.

###

image

Is there a method to utilize the tokenizer of the standard model when employing the ONNX model? Furthermore, beyond just the standard model's tokenizer, is there a possibility to combine it with other tokenizers (those proficient in Japanese tokenization) to generate text using the ONNX model?

natke commented 6 months ago

Hi @CLRafaelR, if you are working in Python, you can swap out any other tokenizer.

ONNX Runtime supports environments other than Python, so we have a core C++ implementation, with bindings to the other languages.

If you upgrade to 0.2.0-rc6, you should improved handling of failed decoding. Please let us know this goes.

Regarding the integration of the functionality of the other tokenizers, can you provide some more detail?

I will re-iterate that Phi-3 in particular is not targeting languages other than English, so you will likely not see good model performance in other languages even if the tokenizer is performing better.

CLRafaelR commented 6 months ago

Hi @natke , thank you for your swift response.

I successfully upgraded the package to 0.2.0-rc6. However, I encountered an error that prevented me from performing inference with the model, after removing CUDA 11.x and unifying my system to CUDA 12.x.

Traceback (most recent call last):
  File "/home/MYDIR/self-instruct/.venv/lib/python3.11/site-packages/onnxruntime_genai/__init__.py", line 11, in <module>
    from onnxruntime_genai.onnxruntime_genai import *
ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/MYDIR/self-instruct/test/phi-3/phi3-qa.py", line 1, in <module>
    import onnxruntime_genai as og
  File "/home/MYDIR/self-instruct/.venv/lib/python3.11/site-packages/onnxruntime_genai/__init__.py", line 14, in <module>
    from onnxruntime_genai.onnxruntime_genai import *
ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory

I'm currently experiencing difficulties reinstalling CUDA 11.x, but I will attempt to run the model and report back once CUDA 11.x is reinstalled. By the way, ---although this question deviates from the original inquery---, are there any plans to support CUDA 12.x for onnxruntime-genai? I believe support for CUDA 12.x would be helpful.


I am fully aware that Phi-3 is primarily trained on English data and may not perform as well in other languages. However, judging from the performance of the 128k model (standard model), I feel its Japanese generation capabilities outperform those of Llama-2 and Llama-2-based models that are fine-tuned using Japanese data. When considering the use of a different tokenizer with this ONNX model, for instance, is it possible to use the tokenizer llm-jp/hf-fast-tokenizer-v22b2 · Hugging Face, which can be loaded as follows, as the tokenizer for the ONNX model?

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("llm-jp/hf-fast-tokenizer-v21b3")
natke commented 6 months ago

Hi @CLRafaelR Thank you for the extra context.

Yes, you can try to run with that alternate tokenizer. Instead of calling the onnxruntime-genai tokenizer, you can call the new one and set the input ids of the onnxruntime-genai params to the output of the new tokenizer instead.

        params.input_ids = <input_tokens from alternate tokenizer>
        generator = og.Generator(model, params)

And yes we plan to publish packages with CUDA 12.

Johannes-Stephan commented 6 months ago

Hi @natke

And yes we plan to publish packages with CUDA 12.

Is there any release date or place where i can get the information once the CUDA 12 version is published?

3x0dv5 commented 5 months ago

I followed the steps in here https://onnxruntime.ai/docs/install/#install-onnx-runtime-gpu-cuda-12x but at the end

ModuleNotFoundError: No module named 'onnxruntime_genai'
# pip list
nvidia-cublas-cu12           12.1.3.1
nvidia-cuda-cupti-cu12       12.1.105
nvidia-cuda-nvcc-cu12        12.3.107
nvidia-cuda-nvrtc-cu12       12.1.105
nvidia-cuda-runtime-cu12     12.1.105
nvidia-cudnn-cu12            8.9.2.26
nvidia-cufft-cu12            11.0.2.54
nvidia-curand-cu12           10.3.2.106
nvidia-cusolver-cu12         11.4.5.107
nvidia-cusparse-cu12         12.1.0.106
nvidia-nccl-cu12             2.20.5
nvidia-nvjitlink-cu12        12.3.101
nvidia-nvtx-cu12             12.1.105
onnx                         1.16.0
onnxruntime-gpu              1.17.1
onnxruntime-training         1.17.3

So I assume we are close to getting it, right? How can we help?

dkjsnnr1 commented 5 months ago

Having cuda installed and running I needed really view libraries to make it run. It worked for me on a windows computer quite well in Conda Environment. Maybe this helps you:

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx/discussions/4#6641da75b98ddf3fe4a55bec

I also got the same error message when I had more then one onnx Library installed. Hope that helps.

natke commented 5 months ago

@Hideki105 were you able to get your Japanese model running successfully?

Hideki105 commented 5 months ago

Hi, @natke https://github.com/natke.

I'm still getting an error when I use \Phi-3-mini-128k-instruct-onnx\cpu_and_mobile\cpu-int4-rtn-block-32.

Hideki

[image: image.png]

2024年5月22日(水) 4:25 Nat Kershaw (MSFT) @.***>:

@Hideki105 https://github.com/Hideki105 were you able to get your Japanese model running successfully?

— Reply to this email directly, view it on GitHub https://github.com/microsoft/onnxruntime-genai/issues/314#issuecomment-2123297586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ5JTTIT3C2MORTGFMBLCALZDONUHAVCNFSM6AAAAABGWFSGS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRTGI4TONJYGY . You are receiving this because you were mentioned.Message ID: @.***>

natke commented 5 months ago

Hi @Hideki105, I don't see the error in your message. Can you please add more information so that we can see what the error is?

natke commented 5 months ago

@Hideki105, are you still running into issues? Please let us know

natke commented 5 months ago

Closing this issue, as we did not hear back from you. Please re-open if you are still experiencing this issue

hosseinalipour commented 4 months ago

I still get the ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory error, I have never intended to use cuda 11 but I also can't find a version of onnxruntime-genai-cuda that works with cuda 12. there is no any table for this repo to state whether it support cuda 12 or not?

baijumeswani commented 4 months ago

I still get the ImportError: libcublasLt.so.11: cannot open shared object file: No such file or directory error, I have never intended to use cuda 11 but I also can't find a version of onnxruntime-genai-cuda that works with cuda 12. there is no any table for this repo to state whether it support cuda 12 or not?

Our currently released onnxruntime-genai package only supports cuda 11.8. We have not released the cuda 12 package yet. We plan on releasing one in the next release.

natke commented 4 months ago

@hosseinalipour @3x0dv5 @Johannes-Stephan @CLRafaelR we have published packages supporting CUDA 12. Please install using the following instructions: https://onnxruntime.ai/docs/genai/howto/install.html#cuda-12