Llava-NeXT processor inconsistencies - unexpected spaces

sunildkumar commented 1 month ago

System Info

transformers version: 4.42.4
Platform: Linux-6.2.0-39-generic-x86_64-with-glibc2.35
Python version: 3.11.9
Huggingface_hub version: 0.23.4
Safetensors version: 0.4.3
Accelerate version: 0.29.3
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp16
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0,1,2,3
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
PyTorch version (GPU?): 2.3.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: NO
Using GPU in script?: NO
GPU type: NVIDIA A100 80GB PCIe

Who can help?

@zucchini-nlp @ArthurZucker

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I'm finding that the LLava-Next processors/tokenizers are adding spaces unexpectedly. clean_up_tokenization_spaces doesn' seem to fix this. Please see this google colab with repo and more information.

Example:

from transformers import LlavaNextProcessor

processor = LlavaNextProcessor.from_pretrained(
        pretrained_model_name_or_path="llava-hf/llava-v1.6-mistral-7b-hf"
        )

text = "[INST] <image>\nWhat is shown in this image? [/INST]"

tokens = processor(text=text)['input_ids'].squeeze(0).tolist()

decoded_tokens = processor.decode(tokens)

print(decoded_tokens)
>>> "<s> [INST] <image> \nWhat is shown in this image? [/INST]"
                       ^  notice the extra space between <image> and \n that isn't in the original encoded text

Expected behavior

Decoding should yield the same text as I input.

zucchini-nlp commented 1 month ago

It might be related to https://github.com/huggingface/transformers/issues/31890 . Let me know if suggestions there work for you :)

ArthurZucker commented 1 month ago

Yep this seems to be related to additional spaces being added after special tokens cc @itazap

ArthurZucker commented 1 month ago

seems like with v4.41 this was already here, we did not break!

itazap commented 1 month ago

passing either from_slow=True OR add_prefix_space=False should fix it. Looks like unlike with from_slow=True, the normalizer adds a prepend_scheme @ArthurZucker

ArthurZucker commented 1 month ago

yep!

sunildkumar commented 1 month ago

It might be related to https://github.com/huggingface/transformers/issues/31890 . Let me know if suggestions there work for you :)

passing either from_slow=True OR add_prefix_space=False should fix it. Looks like unlike with from_slow=True, the normalizer adds a prepend_scheme

Thank you for your quick response and the suggestions.

I'm finding that add_prefix_space=False doesn't work:

processor = LlavaNextProcessor.from_pretrained(
        pretrained_model_name_or_path="llava-hf/llava-v1.6-mistral-7b-hf",
        add_prefix_space=False,
        )

text = "[INST] <image>\nWhat is shown in this image? [/INST]"

tokens = processor(text=text)['input_ids'].squeeze(0).tolist()

decoded_tokens = processor.decode(tokens)
>>> "<s> [INST] <image> \nWhat is shown in this image? [/INST]"

But from_slow=True seems to work:

processor = LlavaNextProcessor.from_pretrained(
        pretrained_model_name_or_path="llava-hf/llava-v1.6-mistral-7b-hf",
        from_slow=True,
        )

text = "[INST] <image>\nWhat is shown in this image? [/INST]"

tokens = processor(text=text)['input_ids'].squeeze(0).tolist()

decoded_tokens = processor.decode(tokens)
>>> "[INST] <image>\nWhat is shown in this image? [/INST]"

itazap commented 1 month ago

It was a recent fix! Perhaps try pulling latest on main? :hugs:

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers