sshh12 / multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Apache License 2.0
158 stars 8 forks source link

theres nothing in my output #7

Closed guilh00009 closed 5 months ago

guilh00009 commented 5 months ago

{'output': ''}

sshh12 commented 5 months ago

What's your input? and the model you are running?

guilh00009 commented 5 months ago

i tried an model of the mistral finetuned on a dataset, but its the same weights, i think

guilh00009 commented 5 months ago

its my model: Guilherme34/Samantha-v2

guilh00009 commented 5 months ago

and that is my inference code: import requests

response = requests.post( "https://d032-34-87-126-184.ngrok-free.app/generate", json={ "messages": [{"role": "user", "content": "What are things I should be cautious about when I visit this place? "}], "images": ["https://github.com/sshh12/multi_token/raw/main/.demo/llava-view.jpg"], }, )

result = response.json() print(result)

sshh12 commented 5 months ago

Ah did you train it using the scripts in this repo? It looks like you potentially used something different https://huggingface.co/Guilherme34/Samantha-v2/tree/main

If this is just a text model, you'd need to finetune it first on image data to be able to use it multimodal.

guilh00009 commented 5 months ago

oh okay, sorry for that lol, thank you

sshh12 commented 5 months ago

Haha np