Closed yh-yao closed 4 months ago
Hello yh-yao, There is no fix mapping rule that SDK should map to specific transformers version. The only concern is that Qualcomm's patch are used for specific transformers version. In general, we only add patch for src/transformers/modeling_outputs.py and specific modeling_xxx.py. So you can just patch it manually. Then you can use any transformers version.
@dapengsmith I am trying to serve llama3-8b. It looks tricky to manually update the patching code. Since all of us want the Qualcomm SDK be used by more people, could you help me with updating the patch code?
https://github.com/quic/efficient-transformers is now available for your LLM execution needs on Qualcomm AI 100 accelerators.
The current SDK only supports "transformer=4.32.0". The newer version of transformer library supports a few new things (e.g. tokenizer.apply_chat_template, streamer). Upgrading the library will save a lot of time on implementing those things in the SDK repo.