runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
MIT License
221 stars 86 forks source link

trust_remote_code Setting Not Applied in runpod/worker-v1-vllm:stable-cuda12.1.0 #91

Closed Juhong-Namgung closed 4 weeks ago

Juhong-Namgung commented 1 month ago

I tried to run the new version of Worker vLLM: runpod/worker-v1-vllm:stable-cuda12.1.0

  1. Worker vLLM v1.1 with vLLM 0.5.3 now available under stable tags Update v1.1 is now available, use the image tag runpod/worker-v1-vllm:stable-cuda12.1.0.

Despite setting trust_remote_code to True (1), the setting does not seem to apply, as indicated by the error below. (Note: The Llama 3.1 model runs correctly, but the trust_remote_code True setting is still not applied.)

My Configuration and Error:

worker-vllm

2024-08-04T23:00:59.760760070-07:00 /usr/local/lib/python3.10/dist-packages/paramiko/pkey.py:100: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
2024-08-04T23:00:59.760824883-07:00   "cipher": algorithms.TripleDES,
2024-08-04T23:00:59.771969595-07:00 /usr/local/lib/python3.10/dist-packages/paramiko/transport.py:259: CryptographyDeprecationWarning: TripleDES has been moved to cryptography.hazmat.decrepit.ciphers.algorithms.TripleDES and will be removed from this module in 48.0.0.
2024-08-04T23:00:59.771988522-07:00   "class": algorithms.TripleDES,
2024-08-04T23:01:02.522603826-07:00 config.py           :58   2024-08-05 06:01:02,522 PyTorch version 2.3.1 available.
2024-08-04T23:01:05.051583895-07:00 engine.py           :24   2024-08-05 06:01:05,051 Engine args: AsyncEngineArgs(model='deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct', served_model_name=None, tokenizer='deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct', skip_tokenizer_init=False, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, seed=0, max_model_len=None, worker_use_ray=False, distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=True, revision=None, code_revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, quantization=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, enable_lora=False, max_loras=1, max_lora_rank=16, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, fully_sharded_loras=False, lora_extra_vocab_size=256, long_lora_scaling_factors=None, lora_dtype='auto', max_cpu_loras=None, device='auto', ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config=None, ignore_patterns=None, preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, guided_decoding_backend='outlines', speculative_model=None, speculative_draft_tensor_parallel_size=None, num_speculative_tokens=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, qlora_adapter_name_or_path=None, disable_logprobs_during_spec_decoding=None, otlp_traces_endpoint=None, engine_use_ray=False, disable_log_requests=True)
2024-08-04T23:01:06.197464693-07:00 engine.py           :113  2024-08-05 06:01:06,196 Error initializing vLLM engine: Failed to load the model config. If the model is a custom model not yet available in the HuggingFace transformers library, consider setting `trust_remote_code=True` in LLM or using the `--trust-remote-code` flag in the CLI.
2024-08-04T23:01:06.197548851-07:00 tokenizer_name_or_path: deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct, tokenizer_revision: None, trust_remote_code: False
2024-08-04T23:01:06.199489519-07:00 Traceback (most recent call last):
2024-08-04T23:01:06.199500973-07:00   File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 44, in get_config
2024-08-04T23:01:06.199510401-07:00     config = AutoConfig.from_pretrained(
2024-08-04T23:01:06.199515569-07:00   File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 975, in from_pretrained
2024-08-04T23:01:06.199521715-07:00     trust_remote_code = resolve_trust_remote_code(
2024-08-04T23:01:06.199527931-07:00   File "/usr/local/lib/python3.10/dist-packages/transformers/dynamic_module_utils.py", line 640, in resolve_trust_remote_code
2024-08-04T23:01:06.199535474-07:00     raise ValueError(
2024-08-04T23:01:06.199558172-07:00 ValueError: Loading deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
2024-08-04T23:01:06.199568579-07:00 
2024-08-04T23:01:06.199574794-07:00 The above exception was the direct cause of the following exception:
2024-08-04T23:01:06.199580871-07:00 
2024-08-04T23:01:06.199587017-07:00 Traceback (most recent call last):
2024-08-04T23:01:06.199593651-07:00   File "/src/handler.py", line 6, in <module>
2024-08-04T23:01:06.199599797-07:00     vllm_engine = vLLMEngine()
2024-08-04T23:01:06.199605454-07:00   File "/src/engine.py", line 28, in __init__
2024-08-04T23:01:06.199612997-07:00     self.llm = self._initialize_llm() if engine is None else engine.llm
2024-08-04T23:01:06.199619702-07:00   File "/src/engine.py", line 114, in _initialize_llm
2024-08-04T23:01:06.199625429-07:00     raise e
2024-08-04T23:01:06.199631505-07:00   File "/src/engine.py", line 108, in _initialize_llm
2024-08-04T23:01:06.199636673-07:00     engine = AsyncLLMEngine.from_engine_args(self.engine_args)
2024-08-04T23:01:06.199641492-07:00   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 457, in from_engine_args
2024-08-04T23:01:06.199648057-07:00     engine_config = engine_args.create_engine_config()
2024-08-04T23:01:06.199658045-07:00   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/arg_utils.py", line 699, in create_engine_config
2024-08-04T23:01:06.199665099-07:00     model_config = ModelConfig(
2024-08-04T23:01:06.199671664-07:00   File "/usr/local/lib/python3.10/dist-packages/vllm/config.py", line 152, in __init__
2024-08-04T23:01:06.199676902-07:00     self.hf_config = get_config(self.model, trust_remote_code, revision,
2024-08-04T23:01:06.199683117-07:00   File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 57, in get_config
2024-08-04T23:01:06.199688286-07:00     raise RuntimeError(err_msg) from e
2024-08-04T23:01:06.199695340-07:00 RuntimeError: Failed to load the model config. If the model is a custom model not yet available in the HuggingFace transformers library, consider setting `trust_remote_code=True` in LLM or using the `--trust-remote-code` flag in the CLI.
TheAlexPG commented 1 month ago

i have same problem image

TheAlexPG commented 1 month ago

i try different settings but isnt work for me( image

TheAlexPG commented 1 month ago

i also try set TRUST_REMOTE_CODE = 1 but dont work

scriptcoded commented 1 month ago

I stumbled upon the same problem. Looking through the code it turns out the documentation is incorrect. Set the environment variable to true manually and it's passed correctly.

https://github.com/runpod-workers/worker-vllm/blob/17a2d844ec45f2d0ff948cb2dd657411af08d631/src/engine_args.py#L25

pandyamarut commented 4 weeks ago

Fixed, Closing this.