PreRelease: v0.1.0 - Githubissues

microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

https://aka.ms/MInference

MIT License

723 stars 29 forks source link

PreRelease: v0.1.0 #2

Closed iofu728 closed 3 months ago

iofu728 commented 3 months ago

release MInference pip package;
add experiments details;
add examples;
add github action;