intel / xFasterTransformer

Apache License 2.0
322 stars 56 forks source link

[API] Add LLaMA attention API. #378

Closed changqi1 closed 2 months ago

changqi1 commented 2 months ago
$ ./ut/layers_attention_test
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from AttentionLLaMA
[ RUN      ] AttentionLLaMA.bfloat16_t
>> create context: 4096 128
>> create llama_attention_key: 0x7f8fe6804040_0x7f8fe6808040_0x7f8fe6808040_0x7f8fe2803040_1_128_32_32
[ RUNTIME  ] XFT::invokeAttentionLLaMA 0.153258 sec
[ RUNTIME  ] XFT::invokeAttentionLLaMA 0.004665 sec
[ RUNTIME  ] XFT::invokeAttentionLLaMA 0.001864 sec
[       OK ] AttentionLLaMA.bfloat16_t (755 ms)
[ RUN      ] AttentionLLaMA.float16_t
>> create context: 4096 128
>> create llama_attention_key: 0x7f8fe6804040_0x7f8fe6808040_0x7f8fe6808040_0x7f8fe2803040_2_128_32_32
[ RUNTIME  ] XFT::invokeAttentionLLaMA 0.119574 sec
[ RUNTIME  ] XFT::invokeAttentionLLaMA 0.046373 sec
[ RUNTIME  ] XFT::invokeAttentionLLaMA 0.039210 sec
[       OK ] AttentionLLaMA.float16_t (1601 ms)
[----------] 2 tests from AttentionLLaMA (2356 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (2357 ms total)
[  PASSED  ] 2 tests.
changqi1 commented 2 months ago

@pujiang2018 @Duyi-Wang Done. Attention APIs don't have the LayerNorm function. The LayerNorm function will add in Decoder Layer API.