Open kanishkanarch opened 2 months ago
I second this issue, would also like to know how to get the output in a human readable format.
Hi @maoki109 and @kanishkanarch , Mistral Common does not perform inference; it primarily handles most of the tokenization before the data is fed to the model. Essentially, it is the preliminary step before sending the request to the actual model, where your text input and images are tokenized/encoded, ready to be used as input to the model. Therefore, it is completely normal that your outputs may not be easy to understand, since they are token IDs only. Usually, what you can do is then feed them to the model, which can be hosted with, for example, Mistral Inference. There is an example of code in this Hugging Face space that makes use of both Mistral Common and Mistral Inference to host and communicate with the model -> space"
I see, thanks @pandora-s-git! I misunderstood.
Does anyone know where I can find documentation on how to use the tokens
and images
outputs from Mistral Common/Pixtral in Python inference? Similar to the Instruction Following example below, but with multimodal inputs.
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_file("./mistral-nemo-instruct-v0.1/tekken.json") # change to extracted tokenizer file
model = Transformer.from_folder("./mistral-nemo-instruct-v0.1") # change to extracted model dir
prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
You can take a look at the code source of the space Ive mentionned previously 🙌
Thanks a lot, @pandora-s-git. The online API works perfectly fine.
I also tried the 'API code' but got the following error. The docs say that one needs to pass the "face tokens" when working locally, but where do I pass them?
Python -VV
Pip Freeze
Reproduction Steps
Expected Behavior
The Pixtral model should output some form of visualizable/interactive data, or additional code snippets of how to use the output tokens.
Additional Context
The
mistral_common.multimodal
module doesn't seem to have any function to make sense of the data output by the tokenizer, if I didn't overlook anything. I tried the open the output image(s) but they must have someread
function according to the selectedopen
function below.TLDR: I have no clue how to use the output image
Suggested Solutions
Suggestions: