Closed sat0r1r1 closed 6 months ago
Same configuration with nvlink,
neuron@neuron:~$ nvidia-smi topo -m GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NV4 N/A GPU1 NV4 X N/A
I'm also interested....
A process ending with simply "Killed" usually means you've run out of system memory. Quantization of especially 70B+ models can require a large amount of system RAM. You can try increasing swap space, maybe?
A process ending with simply "Killed" usually means you've run out of system memory. Quantization of especially 70B+ models can require a large amount of system RAM. You can try increasing swap space, maybe?
Thanks for reply, I think the problem has been solved. Since I previously set WSL to use up to 8GB of system memory. I'll change the settings and run again! Thank you very much.
Yes, it is possible to convert 120B model with 3090. I have done this with DiscoLM. I'm only have 32GB of RAM and I don't think I even had the swap enabled. Also, I just used the default settings.
Hi, it's my first time using this
When I follow exllamav2/doc/convert.md and encountered problems while trying to quantify the miquella-120b model:
English is not my native language, maybe I misunderstood. Is it possible to quantize 120B model on dual 3090 and 128G RAM?