saic-fi / MobileQuant

[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
Other
41 stars 4 forks source link

Support for Snapdragon devices other than 8 Gen 3 #5

Closed tom-pollak closed 1 month ago

tom-pollak commented 1 month ago

The current code / instructions are specific to Snapdragon 8 Gen 3 NPU. I was wondering what modifications would be necessary to support other Snapdragon devices? Specifically:

  1. Are there significant changes required in the code?
  2. Which Snapdragon generations might be compatible with adjustments?

Thanks for your help!

Abhishek8394 commented 1 month ago

I'm currently working on trying to get snapdragon 8 gen 2 working. Will post an update when it works. Which chip are you looking to get support for?

fwtan commented 1 month ago

I'm currently working on trying to get snapdragon 8 gen 2 working. Will post an update when it works. Which chip are you looking to get support for?

Thank @Abhishek8394 for spending time on this! I added 8gen2 support in the latest commit: https://github.com/saic-fi/MobileQuant/commit/084b62c0e41b9ebf2800cc0aecb6cf9ed231bad1

@tom-pollak the code was tested only on 8gen3 and 8gen2. These are the devices I have access to. Unfortunately, there is no plan to support more devices at this moment.

Most of the important modifications were in assets and device/export.py folder, which involved adding the config files, changing the chipset id (8650 -> 8550), and the hexagon id (v75 -> v73).

There were also api changes in the capp code, but what you may want to care about were those in capp/src/qnn_context.cpp

to export a qnn model compatible with 8gen2, please use --device_type 8gen2 with the export.py script:

CUDA_VISIBLE_DEVICES=0 python device/export.py --hf_path ${HF_PATH} \
    --kv_cache --per_channel --use_conv --quant_config 4 8 32 \
    --quant_encoding results/sim_${HF_NAME}_calibration/gen/model_gen_transfered.encodings \
    --kv_encoding results/sim_${HF_NAME}_calibration/gen/model_gen_kv_cache.encodings \
    --device_type 8gen2

to build the app,

make aarch64-android-8gen2

to run the app,

adb shell "cd /data/local/tmp/llm_8gen3_demo && LD_LIBRARY_PATH=. ./simple_app llama-1.1b-mobilequant-w8a8-s1024-e60-8gen3 8gen2" 
fwtan commented 1 month ago

Feel free to reopen the issue if there are any further questions.