noamgat / lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model
MIT License
994 stars 45 forks source link

'PreTrainedTokenizerFast' object has no attribute 'tokenizer' #113

Closed schwjustin closed 2 weeks ago

schwjustin commented 3 weeks ago

When I use this code, I get the error 'PreTrainedTokenizerFast' object has no attribute 'tokenizer':

import tensorrt_llm
from tensorrt_llm.runtime import ModelRunner
from transformers import AutoTokenizer

...

self.tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

...

from lmformatenforcer import JsonSchemaParser
from lmformatenforcer.integrations.trtllm import (
  build_trtllm_logits_processor,
  build_trtlmm_tokenizer_data
)

schema = {
  "type": "object",
  "properties": {
    "answer": {
      "type": "string",
      "title": "Answer"
    },
  },
  "required": ["answer"]
}

parser = JsonSchemaParser(schema)

tokenizer_data = build_trtlmm_tokenizer_data(tokenizer)

logits_processor = build_trtllm_logits_processor(tokenizer_data, parser)
schwjustin commented 2 weeks ago

Any ideas how to get this working?

noamgat commented 2 weeks ago

I tried applying a fix for a similar issue from a different integration. Can you try it?

pip install git+https://github.com/noamgat/lm-format-enforcer.git@bugfix/trtllm-types

schwjustin commented 2 weeks ago

When I add pip install git+https://github.com/noamgat/lm-format-enforcer.git@bugfix/trtllm-types and try to run I get this error and the code doesn't actually get to execute:

`Building image im-yl8Bxa0rQFHd4TicxFGIre

=> Step 0: FROM base

=> Step 1: RUN python -m pip install git+https://github.com/noamgat/lm-format-enforcer.git@bugfix/trtllm-types lm-format-enforcer tensorrt_llm==0.10.0.dev2024042300 --extra-index-url https://pypi.nvidia.com --pre Looking in indexes: http://pypi-mirror.modal.local:5555/simple, https://pypi.nvidia.com Collecting git+https://github.com/noamgat/lm-format-enforcer.git@bugfix/trtllm-types Cloning https://github.com/noamgat/lm-format-enforcer.git (to revision bugfix/trtllm-types) to /tmp/pip-req-build-7lzdakej Running command git clone --filter=blob:none --quiet https://github.com/noamgat/lm-format-enforcer.git /tmp/pip-req-build-7lzdakej Running command git checkout -b bugfix/trtllm-types --track origin/bugfix/trtllm-types Switched to a new branch 'bugfix/trtllm-types' Branch 'bugfix/trtllm-types' set up to track remote branch 'bugfix/trtllm-types' from 'origin'. Resolved https://github.com/noamgat/lm-format-enforcer.git to commit ff5cd86a9f007340ae73e043c419c23ba1438075 Installing build dependencies: started Traceback (most recent call last): File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/pkg/modal/_container_entrypoint.py", line 622, in main(container_args, client) File "/pkg/modal/_container_entrypoint.py", line 542, in main metadata: Message = container_app.object_handle_metadata[object_id] KeyError: 'im-hmf2kfuwV3nTWZPoMiK6rP'`

I tried only pip installing this and pip installing both this and lm-format-enforcer. I get this error either way

noamgat commented 2 weeks ago

It looks like something in your build setup. An alternative way to check the fix is to install the normal package from pip, and add your own copy of trtllm.py with the contents of this: https://github.com/noamgat/lm-format-enforcer/blob/ff5cd86a9f007340ae73e043c419c23ba1438075/lmformatenforcer/integrations/trtllm.py and then import from there instead of the from lmformatenforcer.integrations.trtllm import

schwjustin commented 2 weeks ago

That worked! Thank you