Closed tfcoe closed 4 months ago
@tfcoe Are you setting splitting=False on the processor? Try setting this flag to True before saving it and see if it works.
Yes! Rodrigo that's solved it. In hindsight this was very obvious 🤦 thanks so much!
Yes! Rodrigo that's solved it. In hindsight this was very obvious 🤦 thanks so much!
You're welcome. I also came across this error and it took me a while to figure it out.
System Info
ghcr.io/huggingface/text-generation-inference:2.0.3
Information
Tasks
Reproduction
docker run --gpus all -p 9000:80 \ -v $volume:/data ghcr.io/huggingface/text-generation-inference:2.0.3 \ --model-id $model
added_tokens.json generation_config.json model-00002-of-00004.safetensors model-00004-of-00004.safetensors preprocessor_config.json special_tokens_map.json tokenizer.json version.txt config.json model-00001-of-00004.safetensors model-00003-of-00004.safetensors model.safetensors.index.json processor_config.json tokenizer_config.json tokenizer.model
Load model
qlora_model = Idefics2ForConditionalGeneration.from_pretrained( config["qlora_model_path"], torch_dtype=torch.float16, device_map="auto", ) qlora_model.eval()
2024-05-23T23:05:40.351753Z INFO text_generation_launcher: Default
max_input_tokens
to 4095 2024-05-23T23:05:40.351756Z INFO text_generation_launcher: Defaultmax_total_tokens
to 4096 2024-05-23T23:05:40.351758Z INFO text_generation_launcher: Defaultmax_batch_prefill_tokens
to 4145 2024-05-23T23:05:40.351761Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32] 2024-05-23T23:05:40.351841Z INFO download: text_generation_launcher: Starting download process.2024-05-23T23:05:43.390788Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-05-23T23:05:44.056309Z INFO download: text_generation_launcher: Successfully downloaded weights. 2024-05-23T23:05:44.056478Z INFO shard-manager: text_generation_launcher: Starting shard rank=0 2024-05-23T23:05:51.174995Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-05-23T23:05:51.264243Z INFO shard-manager: text_generation_launcher: Shard ready in 7.207144986s rank=0 2024-05-23T23:05:51.363988Z INFO text_generation_launcher: Starting Webserver 2024-05-23T23:05:51.451386Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '' was expected to have ID '32000' but was given ID 'None'
2024-05-23T23:05:51.451435Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '' was expected to have ID '32001' but was given ID 'None'
2024-05-23T23:05:51.451438Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.19.1/src/tokenizer/serialization.rs:159: Warning: Token '' was expected to have ID '32002' but was given ID 'None'
2024-05-23T23:05:51.451864Z INFO text_generation_router: router/src/main.rs:289: Using config Some(Idefics2(Idefics2))
2024-05-23T23:05:51.451880Z WARN text_generation_router: router/src/main.rs:298: no pipeline tag found for model /data
2024-05-23T23:05:51.455116Z INFO text_generation_router: router/src/main.rs:317: Warming up model
2024-05-23T23:05:52.283020Z ERROR text_generation_launcher: Method Warmup encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(*use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, self._args)
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
2024-05-23T23:05:52.398327Z ERROR warmup{max_input_length=4095 max_prefill_tokens=4145 max_total_tokens=4096 max_batch_size=None}:warmup: text_generation_client: router/client/src/lib.rs:33: Server error: CANCELLED Error: Warmup(Generation("CANCELLED")) 2024-05-23T23:05:52.500691Z ERROR text_generation_launcher: Webserver Crashed 2024-05-23T23:05:52.500724Z INFO text_generation_launcher: Shutting down shards 2024-05-23T23:05:52.565565Z INFO shard-manager: text_generation_launcher: Terminating shard rank=0 2024-05-23T23:05:52.565742Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0 2024-05-23T23:05:52.866152Z INFO shard-manager: text_generation_launcher: shard terminated rank=0 Error: WebserverFailed