huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
9.14k stars 1.08k forks source link

AttributeError: 'Idefics2ForConditionalGeneration' object has no attribute 'model' #2362

Open komninoschatzipapas opened 3 months ago

komninoschatzipapas commented 3 months ago

System Info

1xL40 node on Runpod Latest huggingface/text-generation-inference:latest docker image. Command: --model-id HuggingFaceM4/idefics2-8b --port 8080 --max-input-length 3000 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 --speculate 3 --lora-adapters orionsoftware/rater-adapter-v0.0.1

Information

Tasks

Reproduction

I'm trying to deploy an idefics2 LoRA using the huggingface/text-generation-inference:latest docker image.

The command I'm running is --model-id HuggingFaceM4/idefics2-8b --port 8080 --max-input-length 3000 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 --speculate 3 --lora-adapters orionsoftware/rater-adapter-v0.0.1

I also have a correct HF token to access orionsoftware/rater-adapter-v0.0.1.

It works well without the --lora-adapters orionsoftware/rater-adapter-v0.0.1 part. But once I add the LoRA, I'm getting this error starting up:

2024-08-07T14:53:12.382413544Z 2024-08-07T14:53:12.382183Z  INFO text_generation_launcher: Loading adapter weights into model: orionsoftware/rater-adapter-v0.0.1
2024-08-07T14:53:12.526786055Z 2024-08-07T14:53:12.526533Z ERROR text_generation_launcher: Error when initializing model
2024-08-07T14:53:12.526839016Z Traceback (most recent call last):
2024-08-07T14:53:12.526843694Z   File "/opt/conda/bin/text-generation-server", line 8, in <module>
2024-08-07T14:53:12.526847692Z     sys.exit(app())
2024-08-07T14:53:12.526851690Z   File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
2024-08-07T14:53:12.526855617Z     return get_command(self)(*args, **kwargs)
2024-08-07T14:53:12.526858793Z   File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
2024-08-07T14:53:12.526862660Z     return self.main(*args, **kwargs)
2024-08-07T14:53:12.526865856Z   File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
2024-08-07T14:53:12.526869113Z     return _main(
2024-08-07T14:53:12.526872248Z   File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
2024-08-07T14:53:12.526875775Z     rv = self.invoke(ctx)
2024-08-07T14:53:12.526879843Z   File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
2024-08-07T14:53:12.526883249Z     return _process_result(sub_ctx.command.invoke(sub_ctx))
2024-08-07T14:53:12.526886255Z   File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
2024-08-07T14:53:12.526889331Z     return ctx.invoke(self.callback, **ctx.params)
2024-08-07T14:53:12.526892567Z   File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
2024-08-07T14:53:12.526916632Z     return __callback(*args, **kwargs)
2024-08-07T14:53:12.526919849Z   File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
2024-08-07T14:53:12.526922954Z     return callback(**use_params)  # type: ignore
2024-08-07T14:53:12.526925820Z   File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 109, in serve
2024-08-07T14:53:12.526929326Z     server.serve(
2024-08-07T14:53:12.526932332Z   File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 274, in serve
2024-08-07T14:53:12.526935638Z     asyncio.run(
2024-08-07T14:53:12.526938885Z   File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
2024-08-07T14:53:12.526941910Z     return loop.run_until_complete(main)
2024-08-07T14:53:12.526945066Z   File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
2024-08-07T14:53:12.526948322Z     self.run_forever()
2024-08-07T14:53:12.526951238Z   File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
2024-08-07T14:53:12.526954875Z     self._run_once()
2024-08-07T14:53:12.526957790Z   File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
2024-08-07T14:53:12.526960856Z     handle._run()
2024-08-07T14:53:12.526964022Z   File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
2024-08-07T14:53:12.526967298Z     self._context.run(self._callback, *self._args)
2024-08-07T14:53:12.526971727Z > File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 229, in serve_inner
2024-08-07T14:53:12.526974853Z     model = get_model_with_lora_adapters(
2024-08-07T14:53:12.526977828Z   File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 1216, in get_model_with_lora_adapters
2024-08-07T14:53:12.526983248Z     1 if layer_name == "lm_head" else len(model.model.model.layers)
2024-08-07T14:53:12.526986344Z   File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1729, in __getattr__
2024-08-07T14:53:12.526989230Z     raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
2024-08-07T14:53:12.526992286Z AttributeError: 'Idefics2ForConditionalGeneration' object has no attribute 'model'

This is on a 1xL40 node on Runpod.

orionsoftware/rater-adapter-v0.0.1 was trained using the transformers Trainer and looks like this: CleanShot 2024-08-06 at 13 34 38

I'm curious as to what I'm doing wrong. Unfortunately, my weak Python skills prevent me from debugging this further.

Expected behavior

The expectation was for the model to be served correctly with no errors.

komninoschatzipapas commented 3 months ago

EDIT: Fixed error logs

killian-mahe commented 2 months ago

I'm getting the same error with google/paligemma-3b-ft-docvqa-896. I'm trying to deploy the model with an adapter on GKE.

Here is the command line I used : text-generation-launcher --port 8000 --hostname 0.0.0.0 --model-id google/paligemma-3b-ft-docvqa-896 --lora-adapters ArkeaIAF/paligemma-3b-table2html-lora