Closed Strive-for-excellence closed 1 month ago
Thanks for your PR @Strive-for-excellence! As for now I believe that is not planned, but if we see significant activity on your issue showing that this is important for others, we'd be happy to take a deeper look.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Feature [request]
Recently, multimodal large models based on the Transformer architecture have emerged one after another. Can text-generate-inference provide some support? For example, a feasible solution is for text-generate-inference to only support the inference part, while encoding and decoding are handled by the user. When making a request, input token IDs and return token IDs, rather than text. Examples of such projects include multimodal models like (https://github.com/2noise/ChatTTS) and (https://github.com/RVC-Boss/GPT-SoVITS).
Motivation
none
Your contribution
none