Closed rosario-purple closed 7 months ago
Thank you @rosario-purple, will have a look into it. Maybe could you share the GPU you are using, and how much RAM you have? cc @SunMarc if you have an idea.
@fxmarty This server has 8xA100 80 GB GPU and 1 TB of main RAM
Hi @rosario-purple , i've opened a PR that should solve this issue. This hard freeze is due to the processing taking too much time (mixtral tokenizer is not fast enough when tokenizing the whole wikitext2 dataset)
@SunMarc Thank you!
System Info
Who can help?
@philschmid
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Running this Python code to quantize Mixtral hard-freezes Python (it never completes, and doesn't exit with Ctrl-C, the only way to stop it is kill -9):
Expected behavior
The quantization should not freeze Python.