The enforcer was successfully running on the CPU version of Llama cpp python. Upon installing the cuBLAS version for acceleration, it stopped producing outputs. I tested it with the Metal installation as well. The LLM responds to prompts without any issues when the format enforcer is not being used, it produces blank output when the format enforcer is enabled.
I ran the samples from this repo and the issue persists.
This is the problematic line in llama.py. It is logits_processor(self._input_ids[:idx], logits) on the Metal version and logits_processor(self._input_ids[:idx + 1], logits) on CPU.
Make sure it is passing self._input_ids[:idx + 1] in the logits_processor
The enforcer was successfully running on the CPU version of Llama cpp python. Upon installing the cuBLAS version for acceleration, it stopped producing outputs. I tested it with the Metal installation as well. The LLM responds to prompts without any issues when the format enforcer is not being used, it produces blank output when the format enforcer is enabled.
I ran the samples from this repo and the issue persists.