orig_func Quantization error

Masterjp123 commented 4 months ago

I was trying to quantize an L3 8B model using a jupiter notebook I cooked up and I got this error:

"-- Resuming job",
      "!! Note: Overriding options with settings from existing job",
      "-- Input: /workspace/L3-8B-Lunar-Stheno",
      "-- Output: /workspace/quants",
      "-- Calibration dataset: /workspace/exllamav2/0000.parquet, 100 / 16 rows, 2048 tokens per sample\n",
      "-- Target bits per weight: 5.5 (decoder), 6 (head)",
      "-- Max shard size: 8192 MB",
      "-- Token embeddings (measurement)...",
      "Traceback (most recent call last):",
      "File \"/workspace/exllamav2/convert.py\", line 1, in <module>",
      "import exllamav2.conversion.convert_exl2",
      "File \"/workspace/exllamav2/exllamav2/conversion/convert_exl2.py\", line 252, in <module>",
      "embeddings(job, save_job, model)\n",
      "File \"/workspace/exllamav2/exllamav2/conversion/measure.py\", line 81, in embeddings",
      "module.load()",
      "TypeError: _DecoratorContextManager.__call__() missing 1 required positional argument: 'orig_func'"

I have no clue what orig_func means, checked the docs, found nothing. So umm could someone please help me fix this or at least tell me what orig_func means

turboderp commented 4 months ago

The orig_func error relates to the @torch.inference_mode decorator used on that function, so something's screwy with either your Python or PyTorch version. What versions are you using?

Masterjp123 commented 4 months ago

I think I was using pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04, since that's what RunPod said I was using

turboderp commented 3 months ago

I'm actually not sure if Torch 2.0.1 is still supported. I know there are wheels for it, and the wheel builds but I haven't tested it in a while. Since it's very old.

I'll need to investigate, I guess.

turboderp / exllamav2

orig_func Quantization error #573