huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.05k stars 881 forks source link

Process hangs in local run #1826

Open Hojun-Son opened 2 weeks ago

Hojun-Son commented 2 weeks ago
(text-generation-inference) root@C.10294313:~/tgi_test/text-generation-inference$ text-generation-launcher

2024-04-29T11:11:11.331114Z  INFO text_generation_launcher: Args { model_id: "bigscience/bloom-560m", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, speculate: None, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_top_n_tokens: 5, max_input_tokens: None, max_input_length: None, max_total_tokens: None, waiting_served_ratio: 1.2, max_batch_prefill_tokens: None, max_batch_total_tokens: None, max_waiting_tokens: 20, max_batch_size: None, cuda_graphs: None, hostname: "0.0.0.0", port: 3000, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: None, weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, rope_scaling: None, rope_factor: None, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, tokenizer_config_path: None, disable_grammar_support: false, env: false, max_client_batch_size: 4 }
2024-04-29T11:11:11.331492Z  INFO text_generation_launcher: Default `max_input_tokens` to 4095
2024-04-29T11:11:11.331507Z  INFO text_generation_launcher: Default `max_total_tokens` to 4096
2024-04-29T11:11:11.331516Z  INFO text_generation_launcher: Default `max_batch_prefill_tokens` to 4145
2024-04-29T11:11:11.331530Z  INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-04-29T11:11:11.331725Z  INFO download: text_generation_launcher: Starting download process.
2024-04-29T11:11:15.228645Z  INFO text_generation_launcher: Download file: model.safetensors

I tried to run text generation inference locally, but the process hangs. What is usually the cause of this problem? For your information, all the args are default.

drbh commented 2 weeks ago

hi @Hojun-Son I just ran the same command and was able to start a server. It may be a latent networking issue with downloading the model.

Also please make sure to specify the model id and other parameters at startup

I'd recommend downloading a model first via text-generation-server download-weights HuggingFaceM4/idefics2-8b

and then running it via text-generation-launcher --model-id HuggingFaceM4/idefics2-8b

I hope these commands work for you