microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
https://aka.ms/MInference
MIT License
680 stars 23 forks source link

[Feature Request]: Could you please support microsoft/Phi-3-medium-128k-instruct? Thank you! #26

Open cckao opened 1 month ago

cckao commented 1 month ago

Is your feature request related to a problem? Please describe.

Could you please support microsoft/Phi-3-medium-128k-instruct? Thank you!

I tried to use MInference("minference", "microsoft/Phi-3-mini-128k-instruct") to patch Phi-3-medium-128k-instruct model, but got following error:

TypeError: forward() missing 1 required positional argument: 'position_ids'

Describe the solution you'd like

No response

Additional context

No response

iofu728 commented 1 month ago

Hi @cckao, thanks for your support. We'll support phi-3-medium-128K soon.