Issues with Metal and cuBLAS installation of llama cpp

noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model

MIT License

994 stars 45 forks source link

Issues with Metal and cuBLAS installation of llama cpp #71

Closed hamza233 closed 4 months ago

hamza233 commented 4 months ago

The enforcer was successfully running on the CPU version of Llama cpp python. Upon installing the cuBLAS version for acceleration, it stopped producing outputs. I tested it with the Metal installation as well. The LLM responds to prompts without any issues when the format enforcer is not being used, it produces blank output when the format enforcer is enabled.

I ran the samples from this repo and the issue persists.

hamza233 commented 4 months ago

Update:

This is the problematic line in llama.py. It is logits_processor(self._input_ids[:idx], logits) on the Metal version and logits_processor(self._input_ids[:idx + 1], logits) on CPU.

Make sure it is passing self._input_ids[:idx + 1] in the logits_processor