kendryte / nncase

Open deep learning compiler stack for Kendryte AI accelerators ✨
Apache License 2.0
747 stars 181 forks source link

Converting fp32 onnx to int8 kmodel failed #1228

Closed cwr250 closed 2 months ago

cwr250 commented 2 months ago

Describe the bug Unhandled exception. System.InvalidOperationException: Only The Expr Have CheckedType Can Get It's Shape at Nncase.IR.Expr.get_CheckedShape() at Nncase.Util.ShapeIndex(Expr& input, Int32 index) at Nncase.Util.ComputeSplit(Expr input, Int64 outputSize, Int64 axis) at Nncase.Importer.OnnxImporter.SplitV13(NodeProto& op) at Nncase.Importer.OnnxImporter.VisitSplit(NodeProto& op) at Nncase.Importer.OnnxImporter.Visit(NodeProto op) at Nncase.Importer.OnnxImporter.ConvertOp() at Nncase.BaseImporter.Import() at Nncase.Importers.ImportOnnx(Stream onnx, CompileSession compileSession) at Nncase.Compiler.Interop.CApi.CompilerImportOnnxModule(IntPtr compilerHandle, IntPtr streamHandle) Aborted

i was trying to convert fp32 onnx to int8 kmodel and got the error showing above.

To Reproduce Command line or scripts to reproduce the behavior:

Origin model and code script:

import os
import subprocess
import nncase
import numpy as np
from nncase_base_func import read_model_file, model_simplify, parse_model_input_output

# Ensure nncase is in PATH
result = subprocess.run(["pip", "show", "nncase"], capture_output=True)
location = [i.split(": ")[1] for i in result.stdout.decode().split("\n") if i.startswith("Location:")][0]
os.environ["PATH"] = os.environ.get("PATH", "") + os.pathsep + location

# Load and simplify ONNX model
model_path = 'model/model_fp32.onnx'
simplified_model_path = model_simplify(model_path)
model_content = read_model_file(simplified_model_path)

# Parse model input
_, inputs = parse_model_input_output(simplified_model_path)

# Configure compile options
compile_options = nncase.CompileOptions()
compile_options.target = "k230"
compile_options.input_type = "float32"
compile_options.output_type = "float32"
compile_options.quant_type = "uint8"  # We'll quantize to uint8
compile_options.input_layout = "NCHW"
compile_options.dump_ir = True
compile_options.dump_asm = True
compile_options.dump_dir = "tmp/asr_model"

# Create compiler
compiler = nncase.Compiler(compile_options)

# Import options
import_options = nncase.ImportOptions()

# Import ONNX model
compiler.import_onnx(model_content, import_options)

# PTQ options
ptq_options = nncase.PTQTensorOptions()
ptq_options.samples_count = 10  # Adjust based on your needs
ptq_options.calibrate_method = "NoClip"
ptq_options.finetune_weights_method = "NoFineTuneWeights"

# Generate calibration data
def generate_calibration_data(input_shape, samples_count):
    return [np.random.rand(*input_shape).astype(np.float32) for _ in range(samples_count)]

for input_info in inputs:
    ptq_options.set_tensor_data(generate_calibration_data(input_info['shape'], ptq_options.samples_count))

# Use PTQ
compiler.use_ptq(ptq_options)

# Compile model
compiler.compile()

# Generate kmodel
kmodel = compiler.gencode_tobytes()

# Save kmodel
output_path = "model/model_int8.kmodel"
with open(output_path, "wb") as f:
    f.write(kmodel)

print(f"Conversion completed. Quantized KModel saved to {output_path}")

model used: https://huggingface.co/lovemefan/SenseVoice-onnx/tree/main, the larger one.

Environment (please complete the following information):

curioyang commented 2 months ago

@cwr250 This is because your model has a dynamic shape, but you have not configured the parameters related to the dynamic shape, so an exception occurred. At the same time, the performance of running this model on the K230 may be relatively poor, or even unable to run due to memory issues. This model is very large, and the quantization effect may not be ideal.

cwr250 commented 2 months ago

@curioyang thanks for the info, and i'm just testing the prototype of a product, so i have to make it work until reaching the limitations you mentioned.

back to the errors, i have added the following codes but got the same errors, how can i correctly make it?

compile_options = nncase.CompileOptions()
compile_options.target = "k230"
compile_options.input_type = "float32"
compile_options.output_type = "float32"
compile_options.quant_type = "uint8"

# Configure dynamic shape parameters
compile_options.shape_bucket_enable = True
compile_options.shape_bucket_range_info = {"-1": [1, 100]}  
compile_options.shape_bucket_segments_count = 2
compile_options.shape_bucket_fix_var_map = {"batch_size": 1}  
cwr250 commented 2 months ago

@curioyang this is the new script with ShapeBucket defined, but still got the same error

import nncase
import numpy as np
from nncase_base_func import model_simplify, read_model_file, parse_model_input_output

def compile_asr_kmodel(model_path, dump_path, calib_data):
    print("\n----------   Compiling ASR model    ----------")
    print("Simplifying ONNX model...")
    model_file = model_simplify(model_path)

    print("Setting options...")
    import_options = nncase.ImportOptions()
    import_options.input_shape = None

    compile_options = nncase.CompileOptions()
    compile_options.target = "k230"
    compile_options.dump_ir = True
    compile_options.dump_asm = True
    compile_options.dump_dir = dump_path

    # Configure ShapeBucket options for dynamic shape
    shape_bucket_options = nncase.ShapeBucketOptions()
    shape_bucket_options.shape_bucket_enable = True
    shape_bucket_options.shape_bucket_range_info = {"seq_len": [1, 1024], "batch_size": [1, 16]}
    shape_bucket_options.shape_bucket_segments_count = 4
    shape_bucket_options.shape_bucket_fix_var_map = {
    "batch_size": 1,
    "input_query_dim": 3,
    "event_emo_dim": 2,
    "style_query_dim": 1
}
    compile_options.shape_bucket_options = shape_bucket_options

    ptq_options = nncase.PTQTensorOptions()
    ptq_options.quant_type = "uint8"
    ptq_options.w_quant_type = "uint8"
    ptq_options.calibrate_method = "Kld"
    ptq_options.finetune_weights_method = "NoFineTuneWeights"
    ptq_options.samples_count = len(calib_data[0])
    ptq_options.set_tensor_data(calib_data)

    print("Compiling...")
    compiler = nncase.Compiler(compile_options)

    model_content = read_model_file(model_file)
    compiler.import_onnx(model_content, import_options)
    compiler.use_ptq(ptq_options)

    compiler.compile()
    kmodel = compiler.gencode_tobytes()

    kmodel_path = f"{dump_path}/asr_model_int8.kmodel"
    with open(kmodel_path, 'wb') as f:
        f.write(kmodel)
    print(f"Compiled kmodel saved to: {kmodel_path}")
    print("----------------end-----------------")
    return kmodel_path

def generate_calibration_data(model_path, num_samples=100, vocab_size=5000):
    _, input_info = parse_model_input_output(model_path)
    calib_data = []

    for info in input_info:
        shape = info['shape']
        dtype = info['dtype']

        samples = []
        for _ in range(num_samples):
            if 'speech' in info['name']:
                seq_len = np.random.randint(1, 1025)
                sample = np.random.rand(1, seq_len, 80).astype(dtype)
            elif 'text' in info['name']:
                sample = np.random.randint(0, vocab_size, (1, 4)).astype(dtype)
            else:
                sample = np.random.rand(*shape).astype(dtype)

            samples.append(sample)

        calib_data.append(samples)

    return calib_data

if __name__ == "__main__":
    model_path = "model/model.onnx"
    dump_path = "model/dump"

    # Generate calibration data
    calib_data = generate_calibration_data(model_path)

    # Compile the model
    kmodel_path = compile_asr_kmodel(model_path, dump_path, calib_data)
    print(f"ASR model successfully converted to int8 kmodel: {kmodel_path}")
curioyang commented 2 months ago

@cwr250 There are some errors in your code! In your model, the input name is "speech" and the dynamic axis is "speech_length"

shape_bucket_options = nncase.ShapeBucketOptions()
shape_bucket_options.shape_bucket_enable = True
shape_bucket_options.shape_bucket_range_info = {"speech_length": [1, 16]}
shape_bucket_options.shape_bucket_segments_count = 1
shape_bucket_options.shape_bucket_fix_var_map = {}
compile_options.shape_bucket_options = shape_bucket_options   

The shape_bucket_range_info is used to set the dynamic axis range in Input Node. Ummm, it has nothing to do with this issue's error.

The second input of your model is Speech_lengths. Is it used to control the input length? nncase supports dynamic shapes but only in shapes. This value of speech lengths controls the behavior of this model. It's out of the ability of nncase.

In summary, nncase does not support this model because of the second input speech_lengths.