Closed Iven2132 closed 2 months ago
Hi, the warning message you're seeing is coming from Hugging Face. It may or may not be a problem for your application but it would not interact with using Modal.
and nothing is happing
In the example code you share, you don't appear to prompt the model, so that might explain why you're not seeing any outputs.
I'm going to close the issue as this is an issue tracker for bugs in the Modal client library and there doesn't appear to be one here. But feel free to reach out on Slack if you have more questions about how to use Modal!
I've been trying to deploy the new LLaVA-NeXT with Sglang on Modal but not sure why I'm getting "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained." message for a very long time and nothing is happing.
How can I serve the model directly so it doesn't have to load every time? I just want to load the model one time in my start_engine and use the generate function to get the output. I think this will be fast.