Quantification time is too long

@nbasyl Sorry, When i use the follwing scripts, the quantification time is over ten days:

MODEL_ADDR=huggyllama/llama-7b
HF_ENDPOINT=https://hf-mirror.com export CUDA_VISIBLE_DEVICES=0,1,2,3
    python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=$MODEL_ADDR,use_accelerate=True \
    --tasks arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,winogrande \
    --device cuda \
    --batch_size auto \
    --no_cache \
    --num_fewshot 0 \
    --quant_config 'FPQ_config_llama' \
    --qbits 4 4 4 2 2 2 \
    --calib_size 32 \
    --search_round 3 \
    --search_intervals 0.01 1.2 100

How can i solve it?

nbasyl / LLM-FP4

Quantification time is too long #10