Open webpolis opened 1 year ago
Same weird output for me with evaluation process completed.
python quant_infer.py --wbits 4 --load pyllama-7B4b.pt --text "what are the planets of the milkyway ?" --max_length 24 --cuda cuda:0
🦙: what are the planets of the milkyway? dress Albhttps SEpoispois AlbēattanRef osc Int ** GPU/CPU/Latency Profiling ** ...
Any clue ?
Hi
I tried converting both to 4 bits and 2 bits, but inference in all ocassions outputs strange characters:
I followed instructions in the README.md and ran the quantization this way:
The evaluation process couldn't complete because of lack of GPU memory, but the quantized version was saved succesfully.
Anyone has an advice?