mistralai / mistral-common

Apache License 2.0
634 stars 67 forks source link

How to use Pixtral tokens & outputs? #46

Open kanishkanarch opened 2 months ago

kanishkanarch commented 2 months ago

Python -VV

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]

Pip Freeze

kanishk@anarch[~/mistral] > pip freeze
annotated-types==0.7.0
appdirs==1.4.4
asttokens==2.4.1
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.3.2
cityscapesScripts==2.2.2
coloredlogs==15.0.1
contourpy==1.2.0
cycler==0.12.1
decorator==5.1.1
executing==2.0.1
filelock==3.13.1
fonttools==4.49.0
fsspec==2024.2.0
graphviz==0.20.3
huggingface-hub==0.24.6
humanfriendly==10.0
idna==3.8
ipython==8.22.1
jedi==0.19.1
Jinja2==3.1.3
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
keyboard==0.13.5
kiwisolver==1.4.5
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mistral_common==1.4.0
mplcyberpunk==0.7.1
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
opencv-python==4.9.0.80
packaging==23.2
parso==0.8.3
pexpect==4.9.0
pillow==10.4.0
progressbar==2.5
prompt-toolkit==3.0.43
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.9.1
pydantic_core==2.23.3
pygame==2.5.2
Pygments==2.17.2
pyparsing==3.1.2
pyquaternion==0.9.9
python-dateutil==2.9.0.post0
PyYAML==6.0.2
pyzmq==23.2.1
qbstyles==0.1.4
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
sentencepiece==0.2.0
six==1.16.0
stack-data==0.6.3
sympy==1.12
tiktoken==0.7.0
torch==2.2.1
tqdm==4.66.2
traitlets==5.14.1
triton==2.2.0
typing==3.7.4.3
typing_extensions==4.12.2
urllib3==2.2.2
wcwidth==0.2.13
XPlaneApi==0.0.6
xplaneconnect @ file:///home/kanishk/X-Plane%2010/Resources/plugins/XPlaneConnect/XPlaneConnect
zmq==0.0.0

Reproduction Steps

  1. Run any one of the example code snippets given in the release documentation.

Expected Behavior

The Pixtral model should output some form of visualizable/interactive data, or additional code snippets of how to use the output tokens.

Additional Context

The mistral_common.multimodal module doesn't seem to have any function to make sense of the data output by the tokenizer, if I didn't overlook anything. I tried the open the output image(s) but they must have some read function according to the selected open function below. image

TLDR: I have no clue how to use the output image

image

Suggested Solutions

Suggestions:

  1. Addition of modules to interact with multimodal data
  2. WebUI API, like Gradio
maoki109 commented 2 months ago

I second this issue, would also like to know how to get the output in a human readable format.

pandora-s-git commented 2 months ago

Hi @maoki109 and @kanishkanarch , Mistral Common does not perform inference; it primarily handles most of the tokenization before the data is fed to the model. Essentially, it is the preliminary step before sending the request to the actual model, where your text input and images are tokenized/encoded, ready to be used as input to the model. Therefore, it is completely normal that your outputs may not be easy to understand, since they are token IDs only. Usually, what you can do is then feed them to the model, which can be hosted with, for example, Mistral Inference. There is an example of code in this Hugging Face space that makes use of both Mistral Common and Mistral Inference to host and communicate with the model -> space"

maoki109 commented 1 month ago

I see, thanks @pandora-s-git! I misunderstood.

Does anyone know where I can find documentation on how to use the tokens and images outputs from Mistral Common/Pixtral in Python inference? Similar to the Instruction Following example below, but with multimodal inputs.

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest

tokenizer = MistralTokenizer.from_file("./mistral-nemo-instruct-v0.1/tekken.json")  # change to extracted tokenizer file
model = Transformer.from_folder("./mistral-nemo-instruct-v0.1")  # change to extracted model dir

prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."

completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)
pandora-s-git commented 1 month ago

You can take a look at the code source of the space Ive mentionned previously 🙌

kanishkanarch commented 1 month ago

Thanks a lot, @pandora-s-git. The online API works perfectly fine.

image

I also tried the 'API code' but got the following error. The docs say that one needs to pass the "face tokens" when working locally, but where do I pass them?

image