gyulaz-htec commented 11 months ago

[x] Resnet50: https://zenodo.org/record/4735647/files/resnet50_v1.onnx
[x] BERT: https://zenodo.org/record/3733910/files/model.onnx
[x] Inception v3: pytorch microbenchmarking
[x] RetinaNet: https://github.com/yhenon/pytorch-retinanet
[x] Vicuna: https://github.com/lm-sys/FastChat
[x] YoloV5: https://github.com/ultralytics/yolov5
[x] Whisper: https://huggingface.co/openai/whisper-large
[x] LLaMA-2 7B (4 bit quantized cpp): https://github.com/ggerganov/llama.cpp
[x] Stable Diffusion 2.1 : https://huggingface.co/stabilityai/stable-diffusion-2-1
[x] GPT-J 6B: https://huggingface.co/EleutherAI/gpt-j-6b
[x] MLSR

gyulaz-htec commented 11 months ago

Yolov5

The model passes, but I will leave the repro steps to get the onnx model:

clone repo
check export.py for additional requirements
generate model with python3 export.py --weights yolov5s.pt --include onnx

gyulaz-htec commented 11 months ago

Vicuna

vicuna-7b-v1.5 passes. Repro steps:

https://github.com/lm-sys/FastChat points to huggingface so optimum can be used
optimum-cli export onnx --model lmsys/vicuna-7b-v1.5 vicuna-7b-v_1_5 This fails on our available MI210 machines because we run out of RAM during wight processing, but the onnx file is generated in decoder_model.onnx, which can be compiled

The same applies to vicuna-7b-v1.5-16k

gyulaz-htec commented 11 months ago

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h We still have to look into https://github.com/ggerganov/llama.cpp

gyulaz-htec commented 11 months ago

Stable Diffusion 2.1

Optimum command to get stable-diffusion-2-1: optimum-cli export onnx --model stabilityai/stable-diffusion-2-1 ./stable-diffusion-2-1 The models successfully compile with the following commands:

migraphx-driver compile sd_2-1/vae_decoder/model.onnx --input-dim @latent_sample 2 4 64 64 --gpu
migraphx-driver compile sd_2-1/vae_encoder/model.onnx --input-dim @sample 2 3 512 512 --gpu
migraphx-driver compile sd_2-1/unet/model.onnx --input-dim @sample 2 4 64 64 @timestep 1 @encoder_hidden_states 2 64 1024 --fp16

music-dino commented 11 months ago

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/ Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

gyulaz-htec commented 11 months ago

GPT-J 6B

Optimum command: optimum-cli export onnx --model EleutherAI/gpt-j-6B gpt-j/ The model compiles with the following migraphx command: migraphx-driver compile optimum_models/gpt-j/decoder_model.onnx --fill1 input_ids attention_mask --input-dim @input_ids 1 64 --input-dim @attention_mask 1 64

gyulaz-htec commented 11 months ago

Inception v3

The model compiles with migraphx-driver. To generate the onnx model from pytorch hub model use this python script

gyulaz-htec commented 11 months ago

RetinaNet

The model compiles with migraphx-driver. To generate the onnx model (with ResNet50 backbone) from pytorch hub model use this python script

gyulaz-htec commented 11 months ago

MLSR

We checked two of the top trending SR models from huggingface A2N and AWSRN-BAM .Both are having 2, 3 and 4 scale versions. These are only avaiable in pytorch version, the download and conversion scripts: A2N, AWSRN

Results

[ ] AWSRN-BAM models - failing with: migraphx-driver: /code/AMDMIGraphX/src/targets/gpu/lowering.cpp:76: void migraphx::gpu::miopen_apply::check_shape(shape, instruction_ref): Assertion 'x == i->get_shape()' failed. Which comes from here
- The failing instruction (i) is a reshape with shape (1, 3, 170, 170)
- x is non-standard, i is standard
- The model compiles when add_reshape_lazy_op is disabled
- lowering log
[x] A2N scale 2 - passing
[x] A2N scale 3 - passing

[ ] A2N scale 4 - Fails with:


MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE: ERROR (1
)                                                                                                                                                     
MIOpen(HIP): Error [BuildAsm] comgr status = ERROR (1)                                                                                                
MIOpen(HIP): Warning [BuildAsm] warning: argument unused during compilation: '-nogpulib' [-Wunused-command-line-argument]                             
<instantiation>:3:13: error: Error: Immediate offset is too large for buffer_load instruction                                                         
        .error "Error: Immediate offset is too large for buffer_load instruction"                                                                 
        ^                                                                                                                                         
<instantiation>:25:9: note: while in macro instantiation                                                                                              
    .single_vload line_base, s_off, mbufs_cnt_A, 2, 1                                                                                             
    ^                                                                                                                                             
<instantiation>:17:13: note: while in macro instantiation                                                                                             
        .load_input_line line_base, s_off, mbufs_cnt_A                                                                                            
        ^                                                                                                                                         
<instantiation>:1:1: note: while in macro instantiation                                                                                               
.rept input_lines_per_sgpr                                                                                                                            
^                                                                                                                                                     
<instantiation>:5:5: note: while in macro instantiation
.load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A
^
/tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation
.load_input linesA, mbufs_cnt_A
^
<instantiation>:31:1: error: unmatched .ifs or .elses

^

:25:9: note: while in macro instantiation .single_vload line_base, s_off, mbufs_cnt_A, 2, 1 ^ :17:13: note: while in macro instantiation .load_input_line line_base, s_off, mbufs_cnt_A ^ :1:1: note: while in macro instantiation .rept input_lines_per_sgpr ^ :5:5: note: while in macro instantiation .load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A ^ /tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation .load_input linesA, mbufs_cnt_A ^ MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/hipoc/hipoc_program.cpp:304: Code object build failed. Sou rce: conv3x3.s terminate called after throwing an instance of 'migraphx::version_2_9_0::exception' what(): /code/AMDMIGraphX/src/targets/gpu/include/migraphx/gpu/miopen.hpp:114: find_solution: MIOpen: miopenFindSolutions failed ``` The convolution details: ``` x=[1, 24, 340, 340] w=[24, 24, 3, 3] p=[1, 1, 1, 1] s=[1, 1] d=[1, 1] g=1 workspace_size=99878400 ``` Which hits the offset limit: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/src/kernels/conv3x3.s#L463-L466

kahmed10 commented 11 months ago

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/ Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

Did not see this error on my end. And I got both an encoder model and decoder model. To run the encoder model (using whisper-tiny as example): ./bin/driver perf /onnx/whisper-tiny/encoder_model.onnx --input-dim @input_features 1 80 3000 To run the decoder model: ./bin/driver perf /onnx/whisper-tiny/decoder_model.onnx --fill1 input_ids --input-dim @input_ids 1 256 @encode_hidden_states 1 256 384

attila-dusnoki-htec commented 10 months ago

Stable Diffusion 2.1

These models fail with the latest develop on MI200

With reshape lazy enabled:

Text Encoder, UNet, VAE-Decoder
- src/include/migraphx/op/reshape_lazy.hpp:238: static_compute_shape: reshape_lazy on axis that is not packed.
Without reshape lazy
- Text Encoder compiles
- VAE-Decoder, UNet (ref version compiles)
- check_shapes.hpp:296: packed_layouts: gpu::convolution: Shapes are not packed with correct layout

attila-dusnoki-htec commented 10 months ago

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h We still have to look into https://github.com/ggerganov/llama.cpp

Ignore this, since it is a decoder, it will generate it one-by-one, make sense to use {1, 1} shape. ~It compiles without any arguments. But with that input_ids and attn_mask will be {1, 1}.~

~Changing it to e.g. {1, 4096} (the largest supported size) will fail with~ ~migraphx-driver compile model_zoo/llama2-7b-hf/decoder_model.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096~

~operator: MatMul~ ~/code/AMDMIGraphX/src/include/migraphx/op/dot.hpp:93: compute_shape: DOT: static inner dimensions do not match: {1, 32, 1, 4096} x {1, 32, 1, 128}~

The microsoft version fails with the following:

migraphx-driver read 7B_float32/ONNX/LlamaV2_7B_float32.onnx --input-dim @x 1 2048 4096 @k_cache 1 32 2048 32 128 @v_cache 1 32 2048 32 128 @pos 1 @attn_mask 1 2048 2048

operator: Slice
/code/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:157: only_dims: SLICE: inputs (starts, ends, and input_axes): Only 1d supported

The onnxruntime converted version:

migraphx-driver compile llama2-7b-hf-ort/rank_0_Llama-2-7b-hf_decoder_merged_model_fp32_opt.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096

operator: Add
/code/AMDMIGraphX/src/common.cpp:48: operator(): COMPUTE_BROADCASTLEN: shape {1, 1, 1, 4096} and {1, 1, 1, 2} mismatch!

migraphx-benchmark / AMDMIGraphX

Checking high priority models #145

Yolov5

Vicuna

LLaMA-2 7B

Stable Diffusion 2.1

Whisper

GPT-J 6B

Inception v3

RetinaNet

MLSR

Whisper

Stable Diffusion 2.1

LLaMA-2 7B