Closed gyulaz-htec closed 11 months ago
The model passes, but I will leave the repro steps to get the onnx model:
python3 export.py --weights yolov5s.pt --include onnx
vicuna-7b-v1.5 passes. Repro steps:
optimum-cli export onnx --model lmsys/vicuna-7b-v1.5 vicuna-7b-v_1_5
This fails on our available MI210 machines because we run out of RAM during wight processing, but the onnx file is generated in decoder_model.onnx
, which can be compiled
The same applies to vicuna-7b-v1.5-16k
Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna
Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h
We still have to look into https://github.com/ggerganov/llama.cpp
Optimum command to get stable-diffusion-2-1: optimum-cli export onnx --model stabilityai/stable-diffusion-2-1 ./stable-diffusion-2-1
The models successfully compile with the following commands:
migraphx-driver compile sd_2-1/vae_decoder/model.onnx --input-dim @latent_sample 2 4 64 64 --gpu
migraphx-driver compile sd_2-1/vae_encoder/model.onnx --input-dim @sample 2 3 512 512 --gpu
migraphx-driver compile sd_2-1/unet/model.onnx --input-dim @sample 2 4 64 64 @timestep 1 @encoder_hidden_states 2 64 1024 --fp16
Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/
Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.
Optimum command: optimum-cli export onnx --model EleutherAI/gpt-j-6B gpt-j/
The model compiles with the following migraphx command: migraphx-driver compile optimum_models/gpt-j/decoder_model.onnx --fill1 input_ids attention_mask --input-dim @input_ids 1 64 --input-dim @attention_mask 1 64
The model compiles with migraphx-driver. To generate the onnx model from pytorch hub model use this python script
The model compiles with migraphx-driver. To generate the onnx model (with ResNet50 backbone) from pytorch hub model use this python script
We checked two of the top trending SR models from huggingface A2N and AWSRN-BAM .Both are having 2, 3 and 4 scale versions. These are only avaiable in pytorch version, the download and conversion scripts: A2N, AWSRN
Results
migraphx-driver: /code/AMDMIGraphX/src/targets/gpu/lowering.cpp:76: void migraphx::gpu::miopen_apply::check_shape(shape, instruction_ref): Assertion 'x == i->get_shape()' failed.
Which comes from here
i
) is a reshape
with shape (1, 3, 170, 170)
x
is non-standard, i
is standardadd_reshape_lazy_op
is disabled
MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE: ERROR (1
)
MIOpen(HIP): Error [BuildAsm] comgr status = ERROR (1)
MIOpen(HIP): Warning [BuildAsm] warning: argument unused during compilation: '-nogpulib' [-Wunused-command-line-argument]
<instantiation>:3:13: error: Error: Immediate offset is too large for buffer_load instruction
.error "Error: Immediate offset is too large for buffer_load instruction"
^
<instantiation>:25:9: note: while in macro instantiation
.single_vload line_base, s_off, mbufs_cnt_A, 2, 1
^
<instantiation>:17:13: note: while in macro instantiation
.load_input_line line_base, s_off, mbufs_cnt_A
^
<instantiation>:1:1: note: while in macro instantiation
.rept input_lines_per_sgpr
^
<instantiation>:5:5: note: while in macro instantiation
.load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A
^
/tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation
.load_input linesA, mbufs_cnt_A
^
<instantiation>:31:1: error: unmatched .ifs or .elses
^
Whisper
Optimum command:
optimum-cli export onnx --model openai/whisper-large whisper/
Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.
Did not see this error on my end. And I got both an encoder model and decoder model. To run the encoder model (using whisper-tiny as example): ./bin/driver perf /onnx/whisper-tiny/encoder_model.onnx --input-dim @input_features 1 80 3000 To run the decoder model: ./bin/driver perf /onnx/whisper-tiny/decoder_model.onnx --fill1 input_ids --input-dim @input_ids 1 256 @encode_hidden_states 1 256 384
These models fail with the latest develop on MI200
With reshape lazy enabled:
Text Encoder, UNet, VAE-Decoder
src/include/migraphx/op/reshape_lazy.hpp:238: static_compute_shape: reshape_lazy on axis that is not packed.
Without reshape lazy
Text Encoder compiles
VAE-Decoder, UNet (ref version compiles)
check_shapes.hpp:296: packed_layouts: gpu::convolution: Shapes are not packed with correct layout
LLaMA-2 7B
Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna Optimum command to get the model:
optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h
We still have to look into https://github.com/ggerganov/llama.cpp
Ignore this, since it is a decoder, it will generate it one-by-one, make sense to use {1, 1}
shape.
~It compiles without any arguments. But with that input_ids
and attn_mask
will be {1, 1}
.~
~Changing it to e.g. {1, 4096}
(the largest supported size) will fail with~
~migraphx-driver compile model_zoo/llama2-7b-hf/decoder_model.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096
~
~operator: MatMul
~
~/code/AMDMIGraphX/src/include/migraphx/op/dot.hpp:93: compute_shape: DOT: static inner dimensions do not match: {1, 32, 1, 4096} x {1, 32, 1, 128}
~
The microsoft version fails with the following:
migraphx-driver read 7B_float32/ONNX/LlamaV2_7B_float32.onnx --input-dim @x 1 2048 4096 @k_cache 1 32 2048 32 128 @v_cache 1 32 2048 32 128 @pos 1 @attn_mask 1 2048 2048
operator: Slice
/code/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:157: only_dims: SLICE: inputs (starts, ends, and input_axes): Only 1d supported
The onnxruntime converted version:
migraphx-driver compile llama2-7b-hf-ort/rank_0_Llama-2-7b-hf_decoder_merged_model_fp32_opt.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096
operator: Add
/code/AMDMIGraphX/src/common.cpp:48: operator(): COMPUTE_BROADCASTLEN: shape {1, 1, 1, 4096} and {1, 1, 1, 2} mismatch!