version 0.2.1 iteration plan

Estimated Release Date: 3/19 Release Manager: @suiguoxin Schedule:

Features

[ ] P1 exp: target comp ratio v.s. real comp ratio on specific data
[ ] >token level, < sentence level, list different mappings and design interface P1 word level compression #4
[ ] P1 Support more / faster engines #41, including llama_cpp, FasterTransformer, vLLM ETA: TBD
- [ ] survey which engines to support
[ ] P2 Documentation and examples
- [ ] Supported models and experiment results (with compressor throughput) after a faster engine supported