microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table
MIT License
545 stars 39 forks source link

Document for cross-compile for low-end devices #18

Closed ljy6j13 closed 2 months ago

ljy6j13 commented 2 months ago

I followed the guidance in README to install T-MAC on a S905D3 development board equipped with 4GB RAM, but the process seemed to be lengthy and difficult. During the installation of TVM(automatically executed in "pip install . -e"), a lot of source files need to be compiled, which would take more than 2 hours. Some source files are complex which made the compiler consume more RAM than my device can provide so the compiling processes were killed.

Are there any ways to deploy llama.cpp powered by T-MAC on an embedded device(Linux/Android), without the need to install huge components like python3 and TVM on that device?

### Tasks
- [x] Android cross-compilation guidance
kaleid-liner commented 2 months ago

The STEP.0,1,2,3 can all be done in host device. Then, you can cross-compile llama.cpp for your target device. Check #12 for an example on how to deploy onto Android.

The cross-compile processes are currently undocumented and not fully tested yet. We will soon add the support to run_pipeline.py.

kaleid-liner commented 2 months ago

I have added commits to cross-compile for Android. I'm not familiar with S905D3. However, you can take it as an example on how to cross compile for another device.

You can test if it works for Android and open a new issue if you encounter any problems. I will close this issue for now.