Encoder-decoder models like BERT2BERT are not being loaded

giyaseddin commented 9 months ago

System Info

I am trying to run an encoder-decoder kind of seq2seq model using TGI I read the docs in here https://huggingface.co/docs/transformers/model_doc/bert-generation

however it seems like there's something with the support of the seq2seq models is happening, I really appreciate any note if it has anything that I misunderstand or miss in the usage.

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

BERT2BERT model is configured in the following docker-compose file

version: '3.8'

services:
  tgi:
    image: ghcr.io/huggingface/text-generation-inference:1.3.4
    command:
      - "--model-id"
      - mrm8488/bert2bert_shared-spanish-finetuned-summarization
      - "--max-input-length"
      - "512"
      - "--max-total-tokens"
      - "1024"
      - "--disable-custom-kernels"  # to disable GPU running, CPU only

Example model: bert2bert_shared-spanish-finetuned-summarization

From the openAPI swagger docs, a regular /generate call is yielding to this

2024-01-24T22:04:42.959286Z ERROR text_generation_launcher: Method Decode encountered an error.
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 235, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
    return await self.intercept(
> File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
    return await response
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
    raise error
  File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
    return await behavior(request_or_iterator, context)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 152, in Decode
    generations, next_batch, timings = self.model.generate_token(batch)
  File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/seq2seq_lm.py", line 634, in generate_token
    logits, encoder_last_hidden_state, past = self.forward(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/seq2seq_lm.py", line 599, in forward
    outputs = self.model.forward(
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/encoder_decoder/modeling_encoder_decoder.py", line 658, in forward
    encoder_last_hidden_state=encoder_outputs.last_hidden_state,
AttributeError: 'list' object has no attribute 'last_hidden_state'

2024-01-24T22:04:42.960004Z ERROR batch{batch_size=1}:decode:decode{size=1}:decode{size=1}: text_generation_client: router/client/src/lib.rs:33: Server error: 'list' object has no attribute 'last_hidden_state'
2024-01-24T22:04:42.962413Z ERROR compat_generate{default_return_full_text=false}:generate{parameters=GenerateParameters { best_of: None, temperature: None, repetition_penalty: None, top_k: None, top_p: None, typical_p: None, do_sample: false, max_new_tokens: Some(100), return_full_text: Some(false), stop: [], truncate: None, watermark: false, details: true, decoder_input_details: false, seed: None, top_n_tokens: None }}:generate:generate_stream:infer:send_error: text_generation_router::infer: router/src/infer.rs:608: Request failed during generation: Server error: 'list' object has no attribute 'last_hidden_state'

Expected behavior

It is supposed to run and generate the answer just like Llama Models, Mistral, T5, etc.

Narsil commented 9 months ago

t5 is currently supported officially by TGI, this model falls back on the transformers implementation which we cannot guarantee will work 100% of the time.

We do depend on transformers internals (which may vary on model per model, like last_hidden_state here)).

github-actions[bot] commented 8 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

huggingface / text-generation-inference