Release/v1.18 : Update number of heads in MQA for Falcon

quic / efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.

https://quic.github.io/efficient-transformers/

Other

39 stars 26 forks source link

Release/v1.18 : Update number of heads in MQA for Falcon #125

Closed quic-mamta closed 1 week ago

quic-mamta commented 1 week ago

Update number of heads in MQA for Falcon, fixes falcon 40b export issue

quic-mamta commented 1 week ago

Not required