migraphx-benchmark / AMDMIGraphX

AMD's graph optimization engine.
https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/
MIT License
0 stars 1 forks source link

Checking high priority models #145

Closed gyulaz-htec closed 11 months ago

gyulaz-htec commented 11 months ago
gyulaz-htec commented 11 months ago

Yolov5

The model passes, but I will leave the repro steps to get the onnx model:

  1. clone repo
  2. check export.py for additional requirements
  3. generate model with python3 export.py --weights yolov5s.pt --include onnx
gyulaz-htec commented 11 months ago

Vicuna

vicuna-7b-v1.5 passes. Repro steps:

  1. https://github.com/lm-sys/FastChat points to huggingface so optimum can be used
  2. optimum-cli export onnx --model lmsys/vicuna-7b-v1.5 vicuna-7b-v_1_5 This fails on our available MI210 machines because we run out of RAM during wight processing, but the onnx file is generated in decoder_model.onnx, which can be compiled

    The same applies to vicuna-7b-v1.5-16k

gyulaz-htec commented 11 months ago

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h We still have to look into https://github.com/ggerganov/llama.cpp

gyulaz-htec commented 11 months ago

Stable Diffusion 2.1

Optimum command to get stable-diffusion-2-1: optimum-cli export onnx --model stabilityai/stable-diffusion-2-1 ./stable-diffusion-2-1 The models successfully compile with the following commands:

migraphx-driver compile sd_2-1/vae_decoder/model.onnx --input-dim @latent_sample 2 4 64 64 --gpu
migraphx-driver compile sd_2-1/vae_encoder/model.onnx --input-dim @sample 2 3 512 512 --gpu
migraphx-driver compile sd_2-1/unet/model.onnx --input-dim @sample 2 4 64 64 @timestep 1 @encoder_hidden_states 2 64 1024 --fp16
music-dino commented 11 months ago

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/ Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

gyulaz-htec commented 11 months ago

GPT-J 6B

Optimum command: optimum-cli export onnx --model EleutherAI/gpt-j-6B gpt-j/ The model compiles with the following migraphx command: migraphx-driver compile optimum_models/gpt-j/decoder_model.onnx --fill1 input_ids attention_mask --input-dim @input_ids 1 64 --input-dim @attention_mask 1 64

gyulaz-htec commented 11 months ago

Inception v3

The model compiles with migraphx-driver. To generate the onnx model from pytorch hub model use this python script

gyulaz-htec commented 11 months ago

RetinaNet

The model compiles with migraphx-driver. To generate the onnx model (with ResNet50 backbone) from pytorch hub model use this python script

gyulaz-htec commented 11 months ago

MLSR

We checked two of the top trending SR models from huggingface A2N and AWSRN-BAM .Both are having 2, 3 and 4 scale versions. These are only avaiable in pytorch version, the download and conversion scripts: A2N, AWSRN

Results

^

:25:9: note: while in macro instantiation .single_vload line_base, s_off, mbufs_cnt_A, 2, 1 ^ :17:13: note: while in macro instantiation .load_input_line line_base, s_off, mbufs_cnt_A ^ :1:1: note: while in macro instantiation .rept input_lines_per_sgpr ^ :5:5: note: while in macro instantiation .load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A ^ /tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation .load_input linesA, mbufs_cnt_A ^ MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/hipoc/hipoc_program.cpp:304: Code object build failed. Sou rce: conv3x3.s terminate called after throwing an instance of 'migraphx::version_2_9_0::exception' what(): /code/AMDMIGraphX/src/targets/gpu/include/migraphx/gpu/miopen.hpp:114: find_solution: MIOpen: miopenFindSolutions failed ``` The convolution details: ``` x=[1, 24, 340, 340] w=[24, 24, 3, 3] p=[1, 1, 1, 1] s=[1, 1] d=[1, 1] g=1 workspace_size=99878400 ``` Which hits the offset limit: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/src/kernels/conv3x3.s#L463-L466
kahmed10 commented 11 months ago

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/ Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

Did not see this error on my end. And I got both an encoder model and decoder model. To run the encoder model (using whisper-tiny as example): ./bin/driver perf /onnx/whisper-tiny/encoder_model.onnx --input-dim @input_features 1 80 3000 To run the decoder model: ./bin/driver perf /onnx/whisper-tiny/decoder_model.onnx --fill1 input_ids --input-dim @input_ids 1 256 @encode_hidden_states 1 256 384

attila-dusnoki-htec commented 10 months ago

Stable Diffusion 2.1

These models fail with the latest develop on MI200

With reshape lazy enabled:​

attila-dusnoki-htec commented 10 months ago

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h We still have to look into https://github.com/ggerganov/llama.cpp

Ignore this, since it is a decoder, it will generate it one-by-one, make sense to use {1, 1} shape. ~It compiles without any arguments. But with that input_ids and attn_mask will be {1, 1}.~

~Changing it to e.g. {1, 4096} (the largest supported size) will fail with~ ~migraphx-driver compile model_zoo/llama2-7b-hf/decoder_model.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096~

~operator: MatMul~ ~/code/AMDMIGraphX/src/include/migraphx/op/dot.hpp:93: compute_shape: DOT: static inner dimensions do not match: {1, 32, 1, 4096} x {1, 32, 1, 128}~

The microsoft version fails with the following:

migraphx-driver read 7B_float32/ONNX/LlamaV2_7B_float32.onnx --input-dim @x 1 2048 4096 @k_cache 1 32 2048 32 128 @v_cache 1 32 2048 32 128 @pos 1 @attn_mask 1 2048 2048

operator: Slice
/code/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:157: only_dims: SLICE: inputs (starts, ends, and input_axes): Only 1d supported

The onnxruntime converted version:

migraphx-driver compile llama2-7b-hf-ort/rank_0_Llama-2-7b-hf_decoder_merged_model_fp32_opt.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096

operator: Add
/code/AMDMIGraphX/src/common.cpp:48: operator(): COMPUTE_BROADCASTLEN: shape {1, 1, 1, 4096} and {1, 1, 1, 2} mismatch!