[Feature]: make _init_tokenizer optional and support initiate LLMEngine without tokenizer

🚀 The feature, motivation and pitch

Currently the generate method supports inference based on prompt_token_ids:

    def generate(
        self,
        prompts: Optional[Union[str, List[str]]] = None,
        sampling_params: Optional[SamplingParams] = None,
        prompt_token_ids: Optional[List[List[int]]] = None,
        use_tqdm: bool = True,
        lora_request: Optional[LoRARequest] = None,
    ) -> List[RequestOutput]:

that means tokenizer is optional to the LLM engine.

However, to initiate an LLM engine, it always calls _init_tokenizer , which effectively makes tokenizer required.

The LLM engine cannot be initialized without a valid tokenizer argument.

In our application, we would love to use LLM's powerful engine for inference, but want to keep tokenizer as a separate service.

Alternatives

No response

Additional context

No response

vllm-project / vllm

[Feature]: make _init_tokenizer optional and support initiate LLMEngine without tokenizer #3647

🚀 The feature, motivation and pitch

Alternatives

Additional context