Check microsoft Llama-2-Onnx repo

migraphx-benchmark / AMDMIGraphX

AMD's graph optimization engine.

https://rocmsoftwareplatform.github.io/AMDMIGraphX/doc/html/

MIT License

0 stars 1 forks source link

Check microsoft Llama-2-Onnx repo #148

Open gyulaz-htec opened 8 months ago

gyulaz-htec commented 8 months ago

We got a request to check the https://github.com/microsoft/Llama-2-Onnx repository. This requires a permission to access Llama 2 model, the details are descibed in the repo's readme.

There are two python examples which we should try with migraphx (more details about these in the repo's readme). In these examples we have to replace the onnx runtime with migraphx api calls.

attila-dusnoki-htec commented 8 months ago

To test it with MIGraphX we can update these two apps:

https://github.com/microsoft/Llama-2-Onnx/tree/main/MinimumExample
- Replate ORT with MGX
https://github.com/microsoft/Llama-2-Onnx/tree/main/ChatApp
- There is an inference part, which requires the same changes

attila-dusnoki-htec commented 8 months ago

Testing 7B_float32/ONNX/LlamaV2_7B_float32.onnx

Without and with input dims (@x 1 2048 4096 @k_cache 1 32 2048 32 128 @v_cache 1 32 2048 32 128 @pos 1 @attn_mask 1 2048 2048), it will run into /code/AMDMIGraphX/src/onnx/parse_slice.cpp:155: construct_slice_desc: PARSE_SLICE: steps and variable starts and ends is not supported

attila-dusnoki-htec commented 8 months ago

WIth the latest develop, the above issue is bypassed, because the steps were default (1).

Now it will stop at only_dims(1).

Inputs.size = 3
inputs[0]   = [1, 2048, 1, 64]
inputs[1]   = [1, 1]
inputs[2]   = [1, 1]
axes        = [1]

attila-dusnoki-htec commented 8 months ago

If we would skip that check. It would run into this fail: /code/AMDMIGraphX/src/common.cpp:83: operator(): COMPUTE_BROADCASTED_DYN_DIMS: dynamic shapes {[ 1, 1, {} ], [ 0, 2048, {} ], [ 1, 1, {} ], [ 64, 64, {} ]} and {[ 1, 1, {} ], [ 2048, 2048, {} ], [ 32, 32, {} ], [ 64, 64, {} ]} mismatch!