juncongmoo / pyllama

LLaMA: Open and Efficient Foundation Language Models
GNU General Public License v3.0
2.8k stars 312 forks source link

Readme Should Have Inference Command to use for Quantization in Text #72

Open chigkim opened 1 year ago

chigkim commented 1 year ago

Could you put the actual text for command to run inference with Quantization? I cannot see the image because I'm blind and uses screen reader. Readme says "With quantization, you can run LLaMA with a 4GB memory GPU." Then it has two pictures. Thanks!

sskorol commented 1 year ago
python3 quant_infer.py --wbits 4 --load pyllama-7B4b.pt --text "The meaning of life is" --max_length 24 --cuda cuda:0