mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
634 stars 59 forks source link

Add llama2 and clean up codebase #17

Closed meenchen closed 11 months ago

meenchen commented 11 months ago

This PR includes the following changes:

  1. add llama2 support
  2. clean up the model download script and docs.
  3. refine generate function.
  4. fix the capitals of file naming.
  5. remove unused files/docs.
  6. add token bin and vocab json by default.
  7. improve performance of bmm op to reduce latency of long sequence.
  8. remove metal-cpp source from the codebase.
  9. put source/header related to nn modules into separate dirs.
  10. clean up metal kernels.
  11. reorganize kernel code structure.