Open qxpBlog opened 2 months ago
@nbasyl Sorry, When i use the follwing scripts, the quantification time is over ten days:
MODEL_ADDR=huggyllama/llama-7b HF_ENDPOINT=https://hf-mirror.com export CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py \ --model hf-causal-experimental \ --model_args pretrained=$MODEL_ADDR,use_accelerate=True \ --tasks arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,winogrande \ --device cuda \ --batch_size auto \ --no_cache \ --num_fewshot 0 \ --quant_config 'FPQ_config_llama' \ --qbits 4 4 4 2 2 2 \ --calib_size 32 \ --search_round 3 \ --search_intervals 0.01 1.2 100
How can i solve it?
Did you solve it? I also have same problem..
@nbasyl Sorry, When i use the follwing scripts, the quantification time is over ten days:
How can i solve it?