nbasyl / LLM-FP4

The official implementation of the EMNLP 2023 paper LLM-FP4
MIT License
166 stars 10 forks source link

Quantification time is too long #10

Open qxpBlog opened 2 months ago

qxpBlog commented 2 months ago

@nbasyl Sorry, When i use the follwing scripts, the quantification time is over ten days:

MODEL_ADDR=huggyllama/llama-7b
HF_ENDPOINT=https://hf-mirror.com export CUDA_VISIBLE_DEVICES=0,1,2,3
    python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=$MODEL_ADDR,use_accelerate=True \
    --tasks arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,winogrande \
    --device cuda \
    --batch_size auto \
    --no_cache \
    --num_fewshot 0 \
    --quant_config 'FPQ_config_llama' \
    --qbits 4 4 4 2 2 2 \
    --calib_size 32 \
    --search_round 3 \
    --search_intervals 0.01 1.2 100 

How can i solve it?

wsl448 commented 2 months ago

Did you solve it? I also have same problem..